ACT-PickOrange

针对 LeIsaac SO-101 PickOrange 任务从头训练的 ACT (Action Chunking Transformer) 策略。 An ACT (Action Chunking Transformer) policy trained from scratch on the LeIsaac SO-101 PickOrange task.

🔗 项目仓库 / Project repos：

vitorcen/isaaclab-experience — Isaac Lab + LeIsaac 多策略横评（parent project）
vitorcen/LeIsaac-Training — LeIsaac fork（训练脚本 + 设计文档 / training scripts + design docs）

TL;DR

任务 / Task：Pick up the orange and place it on the plate — SO-101 单臂依次夹起 3 颗橙子并放盘子。 Single-arm SO-101 picks 3 oranges sequentially and places each on a plate.
数据集 / Dataset：LightwheelAI/leisaac-pick-orange — 60 episode 遥操示范。
架构 / Architecture：ACT chunk_size=100，~52M 参数，纯 vision + joint state → action chunk regression（无 LLM / 无 diffusion）。
训练 / Training：lerobot v0.4.0, batch=8 / lr=1e-5 / 20k step / 关闭图像增强，~10h on RTX 4090. 本 ckpt = step 18000 (sweet spot)。
评测 / Eval：Isaac Sim 5.1 + LeIsaac，5-round × 5-run pooled = 33/75 oranges = 44.0% per-orange success (95% CI [29.5%, 58.5%])。
⚠️ 关键 inference 配置 / Critical inference setting：policy_action_horizon=70（旧 v0.5.2 ckpt 的 horizon=32 不适用本 v0.4.0 ckpt，详见 Inference caveat）。

🌳 分支说明 / Branch layout

本 repo 有两个 ckpt，分别记录 framework drift 故事的两端： Two checkpoints are tracked in this repo, capturing both ends of the framework drift story:

Branch	lerobot version	Training step	best horizon	🍊 per-orange p (5-run pool)	备注
main (本 ckpt)	v0.4.0	18000	70	0.440 (33/75)	当前推荐 / current canonical
`lerobot-v052-ckpt-10k`	v0.5.2	10000	32 (旧推荐 / old)	0.267 (4/15 single 5-round)	历史对照 / archived for framework-drift study

详见下方 Framework drift section。 See Framework drift section below.

模型亮点

Highlights

5-round × 5-run pooled 严格统计 confirmed: 44.0% per-orange (95% CI [29.5%, 58.5%])，显著优于 shadowHokage 公开 ckpt 18.3% (95% CI [10.6%, 26.0%])。Welch t-test (per-ep, 消除 episode-cluster) p=0.034，two-proportion Z test p=0.008。
暴露了 lerobot v0.4 → v0.5 framework drift：同 dataset / 同 seed / 同 config，仅切换 lerobot 版本，v0.5.2 训出的 ckpt 跌到 18-27% per-orange（同 shadowHokage 真实水平），锁回 v0.4.0 才恢复 44%。详见底部 framework drift section。
暴露了 LeIsaac 默认 policy_action_horizon=16 的隐性陷阱：chunk_size=100 的 ACT 需要 per-ckpt sweep 找最优 h（本 ckpt h=70；不同训练曲线产出的 ckpt 最优 h 不同）。
无 image augmentation、无 weight decay 调参、无 special trick — 干净的 ACT baseline。

训练配方

Training recipe

项 / Item	值 / Value
Dataset	`LightwheelAI/leisaac-pick-orange` (60 ep, dual-cam 480×640 RGB + 6 DOF state, 30 Hz)
Policy	`act` (LeRobot 实现 / LeRobot impl.)
lerobot version	v0.4.0 (锁版本以避免 framework drift)
Backbone	ResNet18 vision encoder + Transformer encoder/decoder
`chunk_size`	100
`n_action_steps`	100
Batch size	8
Optimizer	AdamW
Learning rate	1e-5 (constant)
Steps	20,000 (本 ckpt = step 18000, 经 sweep 是 sweet spot)
Image augmentation	disabled
Hardware	RTX 4090 (24 GB)
Wall-clock	~10 hours
Recipe credit	shadowHokage/act_policy（v0.4 era 配方原型）

训练入口脚本在我们的 LeIsaac fork：scripts/training/act/train.sh。 Training entrypoint script lives in our LeIsaac fork: scripts/training/act/train.sh.

评测结果 / Eval results

5-round × 5-run pooled stats (25 episodes total)

5-round 协议在 ACT 上 single-run variance 实测 ±40%（同 ckpt 同 horizon 跨 5 runs 范围 2-13/15），所以 canonical 数字必须 pooled multi-run。 The 5-round protocol has ±40% single-run variance for ACT (same ckpt + same horizon, range 2-13/15 across 5 runs), so canonical numbers must be pooled across multiple runs.

配置 / Config	🍊 per-orange p	per-episode mean	95% CI (per-orange)
wsagi/ACT-PickOrange v0.4.0 ckpt-18k h=70 (本 ckpt, 5 runs)	0.440	1.32/ep	[0.295, 0.585]
shadowHokage/act_policy h={16,32,64,70} (4 runs)	0.183	0.55/ep	[0.106, 0.260]

显著性 / Significance：

Two-proportion Z test (per-orange iid): Z = 2.67, p = 0.008 ✅
Welch t-test (per-episode, 消 episode-cluster over-dispersion): t = 2.13, df ≈ 38, p = 0.034 ✅
Effect ratio: 2.20×

0-3 oranges per-episode 分布 / Per-episode oranges distribution

ACT chunk-policy 是 trajectory-level 决策，不是 per-orange iid — 一旦 trajectory 进入正确模式 → 3 颗 cluster 连续成功；一旦偏 → 0 颗全废。实际分布 bimodal 而非 binomial： ACT chunks make trajectory-level decisions, not per-orange iid — once the trajectory enters the correct mode, all 3 oranges cluster as a successful streak; once it goes off-track, the entire episode is wasted. Observed distribution is bimodal, not binomial:

oranges/ep	observed (25 ep)	Binomial(3, 0.440) expected	observed / expected
0	11	4.4	2.51× (over-dispersed)
1	2	10.3	0.19× (under)
2	5	8.1	0.61× (under)
3	7	2.1	3.29× (over-dispersed)

两端 (0/3) 比 binomial 预期多 2.5-3.3×，中间 (1/2) 比预期少一半 — bimodal/U-shape 签名。 Both tails (0/3) appear 2.5-3.3× more often than binomial; middle bins (1/2) appear at half the expected rate — bimodal/U-shape signature.

Per-run 数据点 / Per-run datapoints

ckpt-18k h=70 5 runs (25 episodes total)：

run1: [3, 3, 3, 2, 2] = 13/15 (lucky tail, P≈0.003% under binomial)
run2: [1, 1, 0, 0, 0] =  2/15
run3: [2, 0, 3, 0, 3] =  8/15
run4: [3, 0, 0, 2, 0] =  5/15
run5: [0, 0, 3, 2, 0] =  5/15

范围 2-13/15 = ±40% range，pooled mean = 33/75。

测试环境 / Test setup：Isaac Sim 5.1，task LeIsaac-SO101-PickOrange-v0，episode_length_s=120，step_hz=30，dual-cam 观测。 Test setup: Isaac Sim 5.1, task LeIsaac-SO101-PickOrange-v0, episode_length_s=120, step_hz=30, dual-cam observations.

⚠️ 推理关键配置 / Critical inference caveat

本 v0.4.0 ckpt 最优 horizon = 70（不是旧 v0.5.2 ckpt 的 32！）。每个训练曲线产出的 ckpt 最优 inference horizon 不同，必须 per-ckpt sweep。 The v0.4.0 ckpt's best horizon is 70 (not the old v0.5.2 ckpt's 32!). Each training trajectory produces a ckpt with different optimal inference horizon — per-ckpt sweep is required.

根因 / Root cause

ACT 每个 chunk 输出 100 步动作，是一段完整规划。LeRobot async client 用直接窗口 (receding horizon)，每 policy_action_horizon 步重新查询一次。chunk 内 action 一致性 决定了 best horizon — 训练 framework drift 改了 dataloader RNG / loss normalization → ckpt 内化的 chunk 一致性不同 → 最优 replan 频率不同。 Each ACT chunk outputs a 100-step planned trajectory. The LeRobot async client uses a sliding window, re-querying every policy_action_horizon steps. Chunk-internal action coherence determines the best horizon — framework drift (dataloader RNG / loss normalization) changes the chunk coherence baked into the ckpt → optimal re-plan frequency shifts.

使用方法

Usage

1. 启动 LeRobot async policy_server (lerobot v0.4.0)

conda create -n lerobot-v040 python=3.10 -y && conda activate lerobot-v040
pip install lerobot==0.4.0  # 必须锁版本！避免 framework drift
python -m lerobot.async_inference.policy_server --host 0.0.0.0 --port 8080

2. 客户端启动 LeIsaac eval

通过我们的 vitorcen/LeIsaac-Training fork：

cd LeIsaac
bash scripts/evaluation/run_eval.sh -- \
    --task=LeIsaac-SO101-PickOrange-v0 \
    --eval_rounds=5 \
    --episode_length_s=120 \
    --step_hz=30 \
    --policy_type=lerobot-act \
    --policy_host=127.0.0.1 --policy_port=8080 \
    --policy_checkpoint_path=wsagi/ACT-PickOrange \
    --policy_action_horizon=70 \
    --policy_language_instruction="Pick up the orange and place it on the plate" \
    --device=cuda --enable_cameras

Framework drift — lerobot v0.4 vs v0.5

本 ckpt 重训于 lerobot v0.4.0（锁版本），而不是 main repo 最新 v0.5.x。原因： This ckpt was retrained on lerobot v0.4.0 (pinned version), not the latest v0.5.x main. Reason:

Training framework	5-round per-orange p	显著性
lerobot v0.4.0（本 ckpt）	0.440 (5-run pool, 25 ep)	baseline
lerobot v0.5.2 + 2 patches	0.267 (4/15 single 5-round)	-39% vs v0.4.0 (left-tail p≈0.1%)
shadowHokage (v0.4 era, 2026-01)	0.183 (4-h sweep, 20 ep)	-58% vs v0.4.0, Z=2.67 p=0.008

关键发现 / Key findings：

lerobot PR #3406 (a8b72d96) 改 dataloader (persistent_workers/uint8/prefetch) 在 2026-04-19 merge
lerobot PR #3442 (1add4606) 改 ACT padding loss 在 2026-04-23 merge
两个 PR 都 land 在 v0.5.0 (2026-04-26)；锁回 v0.4.0 可恢复 0.440 per-orange

完整 ablation + 三模型 brainstorm 详见我们的设计文档：act_finetune_pick_orange.html。 Full ablation + 3-model brainstorm in our design doc: act_finetune_pick_orange.html.

局限性

Limitations

数据集 OOD on 2nd-3rd orange：dataset 60 episode × 每集 1 次"放第 N 颗"演示。第 2/3 颗的 state coverage 比第 1 颗稀疏一个数量级。即便 horizon=70 + 5-run pooled，精度仍随颗数线性退化。这是数据问题不是模型问题。 Dataset OOD on 2nd–3rd orange: with 60 episodes × 1 "place N-th orange" demo each, state coverage drops by ~1 order of magnitude per orange. Even at horizon=70 with 5-run pooling, accuracy degrades linearly across oranges. This is a data issue, not a model issue.
5-round single-run variance ±40% — 任何单次 5-round 数字（包括 13/15 lucky tail）都不构成证据；至少 ≥3 runs pool。 ±40% single-run variance — any single 5-round number (including 13/15 lucky tails) is noise; pool ≥3 runs.
无图像增强、无 domain randomization → real-world transfer 可能弱。本 ckpt 仅用于 Isaac Sim 仿真验证，不保证真机 deploy。 No image augmentation or domain randomization → real-world transfer is likely weak. This checkpoint is only validated in Isaac Sim simulation; real-robot deployment is not guaranteed.

致谢

Acknowledgments

LeIsaac 团队 + LightwheelAI 提供任务环境和数据集
LeRobot 团队提供 ACT 实现 + async inference 框架
shadowHokage 公开训练配方作为复刻基线（暴露了 framework drift 问题）

引用

Citation

@inproceedings{zhao2023learning,
  title={Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware},
  author={Zhao, Tony Z. and Kumar, Vikash and Levine, Sergey and Finn, Chelsea},
  booktitle={Robotics: Science and Systems},
  year={2023}
}

License

Apache-2.0

Downloads last month: 69

Safetensors

Model size

51.7M params

Tensor type

F32

Video Preview

Robotics

wsagi
/

ACT-PickOrange