pi05_real_pb_mixed
Fine-tuned pi0.5 VLA model for real robot manipulation.
Task
- Task: Push Block
- Training data: Mixed modes (from-left + from-right)
- Dataset:
real_push_block_mixed - Robot: Franka Panda (7-DOF)
- Cameras: Base RGB + Wrist RGB (256x256)
Training Configuration
| Parameter | Value |
|---|---|
| Base model | pi0.5 (PaliGemma 2B + Gemma 2B action expert) |
| Total parameters | ~3.35B |
| Action dimension | 32 |
| Action horizon | 10 |
| Batch size | 16 |
| Training steps | 5,000 |
| Learning rate | Cosine decay: warmup=500, peak=5e-5, end=5e-6 |
| Optimizer | AdamW (gradient clip norm=1.0) |
| GPUs | 8x NVIDIA A100 |
| Normalization | Quantile normalization |
Checkpoints
- Step 3000: loss = 0.0071
- Step 4000: loss = 0.0047
- Step 4999
Loss Curve
| Step | Loss |
|---|---|
| 0 | 0.0818 |
| 500 | 0.0155 |
| 1000 | 0.0120 |
| 1500 | 0.0110 |
| 2000 | 0.0084 |
| 2500 | 0.0075 |
| 3000 | 0.0071 |
| 3500 | 0.0060 |
| 4000 | 0.0047 |
| 4500 | 0.0042 |
Part of Mode Editing Research
This checkpoint is part of the "Don't Filter Your Data, Edit Your Policy" project (CoRL 2026), investigating post-hoc behavior mode editing for robot policies using Classifier-Guided Distillation (CG-Distill).