ACT-PickOrange

针对 LeIsaac SO-101 PickOrange 任务从头训练的 ACT (Action Chunking Transformer) 策略。 An ACT (Action Chunking Transformer) policy trained from scratch on the LeIsaac SO-101 PickOrange task.

ACT-PickOrange — SO-101 in Isaac Sim

🔗 项目仓库 / Project repos

TL;DR

  • 任务 / TaskPick up the orange and place it on the plate — SO-101 单臂依次夹起 3 颗橙子并放盘子。 Single-arm SO-101 picks 3 oranges sequentially and places each on a plate.
  • 数据集 / DatasetLightwheelAI/leisaac-pick-orange — 60 episode 遥操示范。
  • 架构 / Architecture:ACT chunk_size=100,~52M 参数,纯 vision + joint state → action chunk regression(无 LLM / 无 diffusion)。
  • 训练 / Training:lerobot v0.4.0, batch=8 / lr=1e-5 / 20k step / 关闭图像增强,~10h on RTX 4090. 本 ckpt = step 18000 (sweet spot)。
  • 评测 / Eval:Isaac Sim 5.1 + LeIsaac,5-round × 5-run pooled = 33/75 oranges = 44.0% per-orange success (95% CI [29.5%, 58.5%])。
  • ⚠️ 关键 inference 配置 / Critical inference settingpolicy_action_horizon=70(旧 v0.5.2 ckpt 的 horizon=32 不适用本 v0.4.0 ckpt,详见 Inference caveat)。

🌳 分支说明 / Branch layout

本 repo 有两个 ckpt,分别记录 framework drift 故事的两端: Two checkpoints are tracked in this repo, capturing both ends of the framework drift story:

Branch lerobot version Training step best horizon 🍊 per-orange p (5-run pool) 备注
main (本 ckpt) v0.4.0 18000 70 0.440 (33/75) 当前推荐 / current canonical
lerobot-v052-ckpt-10k v0.5.2 10000 32 (旧推荐 / old) 0.267 (4/15 single 5-round) 历史对照 / archived for framework-drift study

详见下方 Framework drift sectionSee Framework drift section below.

模型亮点

Highlights

  • 5-round × 5-run pooled 严格统计 confirmed: 44.0% per-orange (95% CI [29.5%, 58.5%]),显著优于 shadowHokage 公开 ckpt 18.3% (95% CI [10.6%, 26.0%])。Welch t-test (per-ep, 消除 episode-cluster) p=0.034,two-proportion Z test p=0.008
  • 暴露了 lerobot v0.4 → v0.5 framework drift:同 dataset / 同 seed / 同 config,仅切换 lerobot 版本,v0.5.2 训出的 ckpt 跌到 18-27% per-orange(同 shadowHokage 真实水平),锁回 v0.4.0 才恢复 44%。详见底部 framework drift section。
  • 暴露了 LeIsaac 默认 policy_action_horizon=16 的隐性陷阱:chunk_size=100 的 ACT 需要 per-ckpt sweep 找最优 h(本 ckpt h=70;不同训练曲线产出的 ckpt 最优 h 不同)。
  • 无 image augmentation、无 weight decay 调参、无 special trick — 干净的 ACT baseline。

训练配方

Training recipe

项 / Item 值 / Value
Dataset LightwheelAI/leisaac-pick-orange (60 ep, dual-cam 480×640 RGB + 6 DOF state, 30 Hz)
Policy act (LeRobot 实现 / LeRobot impl.)
lerobot version v0.4.0 (锁版本以避免 framework drift)
Backbone ResNet18 vision encoder + Transformer encoder/decoder
chunk_size 100
n_action_steps 100
Batch size 8
Optimizer AdamW
Learning rate 1e-5 (constant)
Steps 20,000 (本 ckpt = step 18000, 经 sweep 是 sweet spot)
Image augmentation disabled
Hardware RTX 4090 (24 GB)
Wall-clock ~10 hours
Recipe credit shadowHokage/act_policy(v0.4 era 配方原型)

训练入口脚本在我们的 LeIsaac fork:scripts/training/act/train.shTraining entrypoint script lives in our LeIsaac fork: scripts/training/act/train.sh.

评测结果 / Eval results

5-round × 5-run pooled stats (25 episodes total)

5-round 协议在 ACT 上 single-run variance 实测 ±40%(同 ckpt 同 horizon 跨 5 runs 范围 2-13/15),所以 canonical 数字必须 pooled multi-run。 The 5-round protocol has ±40% single-run variance for ACT (same ckpt + same horizon, range 2-13/15 across 5 runs), so canonical numbers must be pooled across multiple runs.

配置 / Config 🍊 per-orange p per-episode mean 95% CI (per-orange)
wsagi/ACT-PickOrange v0.4.0 ckpt-18k h=70 (本 ckpt, 5 runs) 0.440 1.32/ep [0.295, 0.585]
shadowHokage/act_policy h={16,32,64,70} (4 runs) 0.183 0.55/ep [0.106, 0.260]

显著性 / Significance

  • Two-proportion Z test (per-orange iid): Z = 2.67, p = 0.008
  • Welch t-test (per-episode, 消 episode-cluster over-dispersion): t = 2.13, df ≈ 38, p = 0.034
  • Effect ratio: 2.20×

0-3 oranges per-episode 分布 / Per-episode oranges distribution

ACT chunk-policy 是 trajectory-level 决策,不是 per-orange iid — 一旦 trajectory 进入正确模式 → 3 颗 cluster 连续成功;一旦偏 → 0 颗全废。实际分布 bimodal 而非 binomialACT chunks make trajectory-level decisions, not per-orange iid — once the trajectory enters the correct mode, all 3 oranges cluster as a successful streak; once it goes off-track, the entire episode is wasted. Observed distribution is bimodal, not binomial:

oranges/ep observed (25 ep) Binomial(3, 0.440) expected observed / expected
0 11 4.4 2.51× (over-dispersed)
1 2 10.3 0.19× (under)
2 5 8.1 0.61× (under)
3 7 2.1 3.29× (over-dispersed)

两端 (0/3) 比 binomial 预期多 2.5-3.3×,中间 (1/2) 比预期少一半 — bimodal/U-shape 签名。 Both tails (0/3) appear 2.5-3.3× more often than binomial; middle bins (1/2) appear at half the expected rate — bimodal/U-shape signature.

Per-run 数据点 / Per-run datapoints

ckpt-18k h=70 5 runs (25 episodes total):

run1: [3, 3, 3, 2, 2] = 13/15 (lucky tail, P≈0.003% under binomial)
run2: [1, 1, 0, 0, 0] =  2/15
run3: [2, 0, 3, 0, 3] =  8/15
run4: [3, 0, 0, 2, 0] =  5/15
run5: [0, 0, 3, 2, 0] =  5/15

范围 2-13/15 = ±40% range,pooled mean = 33/75。

测试环境 / Test setup:Isaac Sim 5.1,task LeIsaac-SO101-PickOrange-v0episode_length_s=120step_hz=30,dual-cam 观测。 Test setup: Isaac Sim 5.1, task LeIsaac-SO101-PickOrange-v0, episode_length_s=120, step_hz=30, dual-cam observations.

⚠️ 推理关键配置 / Critical inference caveat

本 v0.4.0 ckpt 最优 horizon = 70(不是旧 v0.5.2 ckpt 的 32!)。每个训练曲线产出的 ckpt 最优 inference horizon 不同,必须 per-ckpt sweep。 The v0.4.0 ckpt's best horizon is 70 (not the old v0.5.2 ckpt's 32!). Each training trajectory produces a ckpt with different optimal inference horizon — per-ckpt sweep is required.

根因 / Root cause

ACT 每个 chunk 输出 100 步动作,是一段完整规划。LeRobot async client 用直接窗口 (receding horizon),每 policy_action_horizon 步重新查询一次。chunk 内 action 一致性 决定了 best horizon — 训练 framework drift 改了 dataloader RNG / loss normalization → ckpt 内化的 chunk 一致性不同 → 最优 replan 频率不同。 Each ACT chunk outputs a 100-step planned trajectory. The LeRobot async client uses a sliding window, re-querying every policy_action_horizon steps. Chunk-internal action coherence determines the best horizon — framework drift (dataloader RNG / loss normalization) changes the chunk coherence baked into the ckpt → optimal re-plan frequency shifts.

推荐配置 / Recommended settings

--policy_type=lerobot-act
--policy_action_horizon=70                  # for THIS ckpt (v0.4.0 ckpt-18k); 旧 v0.5.2 ckpt 用 32
--policy_checkpoint_path=wsagi/ACT-PickOrange
--step_hz=30                                # 对齐 dataset 30Hz / matches dataset 30Hz
--episode_length_s=120

使用方法

Usage

1. 启动 LeRobot async policy_server (lerobot v0.4.0)

conda create -n lerobot-v040 python=3.10 -y && conda activate lerobot-v040
pip install lerobot==0.4.0  # 必须锁版本!避免 framework drift
python -m lerobot.async_inference.policy_server --host 0.0.0.0 --port 8080

2. 客户端启动 LeIsaac eval

通过我们的 vitorcen/LeIsaac-Training fork:

cd LeIsaac
bash scripts/evaluation/run_eval.sh -- \
    --task=LeIsaac-SO101-PickOrange-v0 \
    --eval_rounds=5 \
    --episode_length_s=120 \
    --step_hz=30 \
    --policy_type=lerobot-act \
    --policy_host=127.0.0.1 --policy_port=8080 \
    --policy_checkpoint_path=wsagi/ACT-PickOrange \
    --policy_action_horizon=70 \
    --policy_language_instruction="Pick up the orange and place it on the plate" \
    --device=cuda --enable_cameras

Framework drift — lerobot v0.4 vs v0.5

本 ckpt 重训于 lerobot v0.4.0(锁版本),而不是 main repo 最新 v0.5.x。原因: This ckpt was retrained on lerobot v0.4.0 (pinned version), not the latest v0.5.x main. Reason:

Training framework 5-round per-orange p 显著性
lerobot v0.4.0(本 ckpt) 0.440 (5-run pool, 25 ep) baseline
lerobot v0.5.2 + 2 patches 0.267 (4/15 single 5-round) -39% vs v0.4.0 (left-tail p≈0.1%)
shadowHokage (v0.4 era, 2026-01) 0.183 (4-h sweep, 20 ep) -58% vs v0.4.0, Z=2.67 p=0.008

关键发现 / Key findings

  • lerobot PR #3406 (a8b72d96) 改 dataloader (persistent_workers/uint8/prefetch) 在 2026-04-19 merge
  • lerobot PR #3442 (1add4606) 改 ACT padding loss 在 2026-04-23 merge
  • 两个 PR 都 land 在 v0.5.0 (2026-04-26);锁回 v0.4.0 可恢复 0.440 per-orange

完整 ablation + 三模型 brainstorm 详见我们的设计文档:act_finetune_pick_orange.htmlFull ablation + 3-model brainstorm in our design doc: act_finetune_pick_orange.html.

局限性

Limitations

  • 数据集 OOD on 2nd-3rd orange:dataset 60 episode × 每集 1 次"放第 N 颗"演示。第 2/3 颗的 state coverage 比第 1 颗稀疏一个数量级。即便 horizon=70 + 5-run pooled,精度仍随颗数线性退化。这是数据问题不是模型问题。 Dataset OOD on 2nd–3rd orange: with 60 episodes × 1 "place N-th orange" demo each, state coverage drops by ~1 order of magnitude per orange. Even at horizon=70 with 5-run pooling, accuracy degrades linearly across oranges. This is a data issue, not a model issue.
  • 5-round single-run variance ±40% — 任何单次 5-round 数字(包括 13/15 lucky tail)都不构成证据;至少 ≥3 runs pool。 ±40% single-run variance — any single 5-round number (including 13/15 lucky tails) is noise; pool ≥3 runs.
  • 无图像增强、无 domain randomization → real-world transfer 可能弱。本 ckpt 仅用于 Isaac Sim 仿真验证,不保证真机 deploy。 No image augmentation or domain randomization → real-world transfer is likely weak. This checkpoint is only validated in Isaac Sim simulation; real-robot deployment is not guaranteed.

相关

Related

致谢

Acknowledgments

  • LeIsaac 团队 + LightwheelAI 提供任务环境和数据集
  • LeRobot 团队提供 ACT 实现 + async inference 框架
  • shadowHokage 公开训练配方作为复刻基线(暴露了 framework drift 问题)

引用

Citation

@inproceedings{zhao2023learning,
  title={Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware},
  author={Zhao, Tony Z. and Kumar, Vikash and Levine, Sergey and Finn, Chelsea},
  booktitle={Robotics: Science and Systems},
  year={2023}
}

License

Apache-2.0

Downloads last month
69
Safetensors
Model size
51.7M params
Tensor type
F32
·
Video Preview
loading

Dataset used to train wsagi/ACT-PickOrange