Instructions to use wsagi/ACT-PickOrange with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- LeRobot
How to use wsagi/ACT-PickOrange with LeRobot:
- Notebooks
- Google Colab
- Kaggle
ACT-PickOrange
针对 LeIsaac SO-101 PickOrange 任务从头训练的 ACT (Action Chunking Transformer) 策略。 An ACT (Action Chunking Transformer) policy trained from scratch on the LeIsaac SO-101 PickOrange task.
🔗 项目仓库 / Project repos:
- vitorcen/isaaclab-experience — Isaac Lab + LeIsaac 多策略横评(parent project)
- vitorcen/LeIsaac-Training — LeIsaac fork(训练脚本 + 设计文档 / training scripts + design docs)
TL;DR
- 任务 / Task:
Pick up the orange and place it on the plate— SO-101 单臂依次夹起 3 颗橙子并放盘子。 Single-arm SO-101 picks 3 oranges sequentially and places each on a plate. - 数据集 / Dataset:
LightwheelAI/leisaac-pick-orange— 60 episode 遥操示范。 - 架构 / Architecture:ACT chunk_size=100,~52M 参数,纯 vision + joint state → action chunk regression(无 LLM / 无 diffusion)。
- 训练 / Training:lerobot v0.4.0, batch=8 / lr=1e-5 / 20k step / 关闭图像增强,~10h on RTX 4090. 本 ckpt = step 18000 (sweet spot)。
- 评测 / Eval:Isaac Sim 5.1 + LeIsaac,5-round × 5-run pooled = 33/75 oranges = 44.0% per-orange success (95% CI [29.5%, 58.5%])。
- ⚠️ 关键 inference 配置 / Critical inference setting:
policy_action_horizon=70(旧 v0.5.2 ckpt 的 horizon=32 不适用本 v0.4.0 ckpt,详见 Inference caveat)。
🌳 分支说明 / Branch layout
本 repo 有两个 ckpt,分别记录 framework drift 故事的两端: Two checkpoints are tracked in this repo, capturing both ends of the framework drift story:
| Branch | lerobot version | Training step | best horizon | 🍊 per-orange p (5-run pool) | 备注 |
|---|---|---|---|---|---|
| main (本 ckpt) | v0.4.0 | 18000 | 70 | 0.440 (33/75) | 当前推荐 / current canonical |
lerobot-v052-ckpt-10k |
v0.5.2 | 10000 | 32 (旧推荐 / old) | 0.267 (4/15 single 5-round) | 历史对照 / archived for framework-drift study |
详见下方 Framework drift section。 See Framework drift section below.
模型亮点
Highlights
- 5-round × 5-run pooled 严格统计 confirmed: 44.0% per-orange (95% CI [29.5%, 58.5%]),显著优于 shadowHokage 公开 ckpt 18.3% (95% CI [10.6%, 26.0%])。Welch t-test (per-ep, 消除 episode-cluster) p=0.034,two-proportion Z test p=0.008。
- 暴露了 lerobot v0.4 → v0.5 framework drift:同 dataset / 同 seed / 同 config,仅切换 lerobot 版本,v0.5.2 训出的 ckpt 跌到 18-27% per-orange(同 shadowHokage 真实水平),锁回 v0.4.0 才恢复 44%。详见底部 framework drift section。
- 暴露了 LeIsaac 默认
policy_action_horizon=16的隐性陷阱:chunk_size=100 的 ACT 需要 per-ckpt sweep 找最优 h(本 ckpt h=70;不同训练曲线产出的 ckpt 最优 h 不同)。 - 无 image augmentation、无 weight decay 调参、无 special trick — 干净的 ACT baseline。
训练配方
Training recipe
| 项 / Item | 值 / Value |
|---|---|
| Dataset | LightwheelAI/leisaac-pick-orange (60 ep, dual-cam 480×640 RGB + 6 DOF state, 30 Hz) |
| Policy | act (LeRobot 实现 / LeRobot impl.) |
| lerobot version | v0.4.0 (锁版本以避免 framework drift) |
| Backbone | ResNet18 vision encoder + Transformer encoder/decoder |
chunk_size |
100 |
n_action_steps |
100 |
| Batch size | 8 |
| Optimizer | AdamW |
| Learning rate | 1e-5 (constant) |
| Steps | 20,000 (本 ckpt = step 18000, 经 sweep 是 sweet spot) |
| Image augmentation | disabled |
| Hardware | RTX 4090 (24 GB) |
| Wall-clock | ~10 hours |
| Recipe credit | shadowHokage/act_policy(v0.4 era 配方原型) |
训练入口脚本在我们的 LeIsaac fork:scripts/training/act/train.sh。
Training entrypoint script lives in our LeIsaac fork: scripts/training/act/train.sh.
评测结果 / Eval results
5-round × 5-run pooled stats (25 episodes total)
5-round 协议在 ACT 上 single-run variance 实测 ±40%(同 ckpt 同 horizon 跨 5 runs 范围 2-13/15),所以 canonical 数字必须 pooled multi-run。 The 5-round protocol has ±40% single-run variance for ACT (same ckpt + same horizon, range 2-13/15 across 5 runs), so canonical numbers must be pooled across multiple runs.
| 配置 / Config | 🍊 per-orange p | per-episode mean | 95% CI (per-orange) |
|---|---|---|---|
| wsagi/ACT-PickOrange v0.4.0 ckpt-18k h=70 (本 ckpt, 5 runs) | 0.440 | 1.32/ep | [0.295, 0.585] |
| shadowHokage/act_policy h={16,32,64,70} (4 runs) | 0.183 | 0.55/ep | [0.106, 0.260] |
显著性 / Significance:
- Two-proportion Z test (per-orange iid): Z = 2.67, p = 0.008 ✅
- Welch t-test (per-episode, 消 episode-cluster over-dispersion): t = 2.13, df ≈ 38, p = 0.034 ✅
- Effect ratio: 2.20×
0-3 oranges per-episode 分布 / Per-episode oranges distribution
ACT chunk-policy 是 trajectory-level 决策,不是 per-orange iid — 一旦 trajectory 进入正确模式 → 3 颗 cluster 连续成功;一旦偏 → 0 颗全废。实际分布 bimodal 而非 binomial: ACT chunks make trajectory-level decisions, not per-orange iid — once the trajectory enters the correct mode, all 3 oranges cluster as a successful streak; once it goes off-track, the entire episode is wasted. Observed distribution is bimodal, not binomial:
| oranges/ep | observed (25 ep) | Binomial(3, 0.440) expected | observed / expected |
|---|---|---|---|
| 0 | 11 | 4.4 | 2.51× (over-dispersed) |
| 1 | 2 | 10.3 | 0.19× (under) |
| 2 | 5 | 8.1 | 0.61× (under) |
| 3 | 7 | 2.1 | 3.29× (over-dispersed) |
两端 (0/3) 比 binomial 预期多 2.5-3.3×,中间 (1/2) 比预期少一半 — bimodal/U-shape 签名。 Both tails (0/3) appear 2.5-3.3× more often than binomial; middle bins (1/2) appear at half the expected rate — bimodal/U-shape signature.
Per-run 数据点 / Per-run datapoints
ckpt-18k h=70 5 runs (25 episodes total):
run1: [3, 3, 3, 2, 2] = 13/15 (lucky tail, P≈0.003% under binomial)
run2: [1, 1, 0, 0, 0] = 2/15
run3: [2, 0, 3, 0, 3] = 8/15
run4: [3, 0, 0, 2, 0] = 5/15
run5: [0, 0, 3, 2, 0] = 5/15
范围 2-13/15 = ±40% range,pooled mean = 33/75。
测试环境 / Test setup:Isaac Sim 5.1,task LeIsaac-SO101-PickOrange-v0,episode_length_s=120,step_hz=30,dual-cam 观测。
Test setup: Isaac Sim 5.1, task LeIsaac-SO101-PickOrange-v0, episode_length_s=120, step_hz=30, dual-cam observations.
⚠️ 推理关键配置 / Critical inference caveat
本 v0.4.0 ckpt 最优 horizon = 70(不是旧 v0.5.2 ckpt 的 32!)。每个训练曲线产出的 ckpt 最优 inference horizon 不同,必须 per-ckpt sweep。 The v0.4.0 ckpt's best horizon is 70 (not the old v0.5.2 ckpt's 32!). Each training trajectory produces a ckpt with different optimal inference horizon — per-ckpt sweep is required.
根因 / Root cause
ACT 每个 chunk 输出 100 步动作,是一段完整规划。LeRobot async client 用直接窗口 (receding horizon),每 policy_action_horizon 步重新查询一次。chunk 内 action 一致性 决定了 best horizon — 训练 framework drift 改了 dataloader RNG / loss normalization → ckpt 内化的 chunk 一致性不同 → 最优 replan 频率不同。
Each ACT chunk outputs a 100-step planned trajectory. The LeRobot async client uses a sliding window, re-querying every policy_action_horizon steps. Chunk-internal action coherence determines the best horizon — framework drift (dataloader RNG / loss normalization) changes the chunk coherence baked into the ckpt → optimal re-plan frequency shifts.
推荐配置 / Recommended settings
--policy_type=lerobot-act
--policy_action_horizon=70 # for THIS ckpt (v0.4.0 ckpt-18k); 旧 v0.5.2 ckpt 用 32
--policy_checkpoint_path=wsagi/ACT-PickOrange
--step_hz=30 # 对齐 dataset 30Hz / matches dataset 30Hz
--episode_length_s=120
使用方法
Usage
1. 启动 LeRobot async policy_server (lerobot v0.4.0)
conda create -n lerobot-v040 python=3.10 -y && conda activate lerobot-v040
pip install lerobot==0.4.0 # 必须锁版本!避免 framework drift
python -m lerobot.async_inference.policy_server --host 0.0.0.0 --port 8080
2. 客户端启动 LeIsaac eval
通过我们的 vitorcen/LeIsaac-Training fork:
cd LeIsaac
bash scripts/evaluation/run_eval.sh -- \
--task=LeIsaac-SO101-PickOrange-v0 \
--eval_rounds=5 \
--episode_length_s=120 \
--step_hz=30 \
--policy_type=lerobot-act \
--policy_host=127.0.0.1 --policy_port=8080 \
--policy_checkpoint_path=wsagi/ACT-PickOrange \
--policy_action_horizon=70 \
--policy_language_instruction="Pick up the orange and place it on the plate" \
--device=cuda --enable_cameras
Framework drift — lerobot v0.4 vs v0.5
本 ckpt 重训于 lerobot v0.4.0(锁版本),而不是 main repo 最新 v0.5.x。原因: This ckpt was retrained on lerobot v0.4.0 (pinned version), not the latest v0.5.x main. Reason:
| Training framework | 5-round per-orange p | 显著性 |
|---|---|---|
| lerobot v0.4.0(本 ckpt) | 0.440 (5-run pool, 25 ep) | baseline |
| lerobot v0.5.2 + 2 patches | 0.267 (4/15 single 5-round) | -39% vs v0.4.0 (left-tail p≈0.1%) |
| shadowHokage (v0.4 era, 2026-01) | 0.183 (4-h sweep, 20 ep) | -58% vs v0.4.0, Z=2.67 p=0.008 |
关键发现 / Key findings:
- lerobot PR #3406 (a8b72d96) 改 dataloader (
persistent_workers/uint8/prefetch) 在 2026-04-19 merge - lerobot PR #3442 (1add4606) 改 ACT padding loss 在 2026-04-23 merge
- 两个 PR 都 land 在 v0.5.0 (2026-04-26);锁回 v0.4.0 可恢复 0.440 per-orange
完整 ablation + 三模型 brainstorm 详见我们的设计文档:act_finetune_pick_orange.html。
Full ablation + 3-model brainstorm in our design doc: act_finetune_pick_orange.html.
局限性
Limitations
- 数据集 OOD on 2nd-3rd orange:dataset 60 episode × 每集 1 次"放第 N 颗"演示。第 2/3 颗的 state coverage 比第 1 颗稀疏一个数量级。即便 horizon=70 + 5-run pooled,精度仍随颗数线性退化。这是数据问题不是模型问题。 Dataset OOD on 2nd–3rd orange: with 60 episodes × 1 "place N-th orange" demo each, state coverage drops by ~1 order of magnitude per orange. Even at horizon=70 with 5-run pooling, accuracy degrades linearly across oranges. This is a data issue, not a model issue.
- 5-round single-run variance ±40% — 任何单次 5-round 数字(包括 13/15 lucky tail)都不构成证据;至少 ≥3 runs pool。 ±40% single-run variance — any single 5-round number (including 13/15 lucky tails) is noise; pool ≥3 runs.
- 无图像增强、无 domain randomization → real-world transfer 可能弱。本 ckpt 仅用于 Isaac Sim 仿真验证,不保证真机 deploy。 No image augmentation or domain randomization → real-world transfer is likely weak. This checkpoint is only validated in Isaac Sim simulation; real-robot deployment is not guaranteed.
相关
Related
- 同任务对照 / Same-task comparisons:
wsagi/DiffusionPolicy-PickOrange— 自训 Diffusion Policy (267M, DDIM 32-step swap)shadowHokage/act_policy— v0.4 era 公开 ckpt(5-run pool = 18.3%)LightwheelAI/leisaac-pick-orange-v0— GR00T N1.5 baseline
- 完整训练 + eval 配方 + framework drift 调研:vitorcen/LeIsaac-Training fork
致谢
Acknowledgments
- LeIsaac 团队 + LightwheelAI 提供任务环境和数据集
- LeRobot 团队提供 ACT 实现 + async inference 框架
- shadowHokage 公开训练配方作为复刻基线(暴露了 framework drift 问题)
引用
Citation
@inproceedings{zhao2023learning,
title={Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware},
author={Zhao, Tony Z. and Kumar, Vikash and Levine, Sergey and Finn, Chelsea},
booktitle={Robotics: Science and Systems},
year={2023}
}
License
Apache-2.0
- Downloads last month
- 69
