pi05_real_pb_from_left
Fine-tuned pi0.5 VLA model for real robot manipulation.
Task
- Task: Push Block
- Training data: From-left mode only
- Dataset:
real_push_block_from_left - Robot: Franka Panda (7-DOF)
- Cameras: Base RGB + Wrist RGB (256x256)
Training Configuration
| Parameter | Value |
|---|---|
| Base model | pi0.5 (PaliGemma 2B + Gemma 2B action expert) |
| Total parameters | ~3.35B |
| Action dimension | 32 |
| Action horizon | 10 |
| Batch size | 16 |
| Training steps | 5,000 |
| Learning rate | Cosine decay: warmup=500, peak=5e-5, end=5e-6 |
| Optimizer | AdamW (gradient clip norm=1.0) |
| GPUs | 8x NVIDIA A100 |
| Normalization | Quantile normalization |
Checkpoints
- Step 3000: loss = 0.0064
- Step 4000: loss = 0.0037
- Step 4999
Loss Curve
| Step | Loss |
|---|---|
| 0 | 0.0946 |
| 500 | 0.0150 |
| 1000 | 0.0126 |
| 1500 | 0.0105 |
| 2000 | 0.0083 |
| 2500 | 0.0069 |
| 3000 | 0.0064 |
| 3500 | 0.0053 |
| 4000 | 0.0037 |
| 4500 | 0.0034 |
Part of Mode Editing Research
This checkpoint is part of the "Don't Filter Your Data, Edit Your Policy" project (CoRL 2026), investigating post-hoc behavior mode editing for robot policies using Classifier-Guided Distillation (CG-Distill).