Update model card: concise, consistent, add GitHub link
Browse files
README.md
CHANGED
|
@@ -49,6 +49,22 @@ EdgeVLA-Tiny pushes the EdgeVLA architecture to its smallest configuration: Fast
|
|
| 49 |
|
| 50 |
Trained exclusively on [`lerobot/fmb`](https://huggingface.co/datasets/lerobot/fmb) (3-camera Franka Panda manipulation). Source code: [enfuse/edgevla](https://github.com/enfuse/edgevla)
|
| 51 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 52 |
## Results (FMB Offline, 500 held-out samples)
|
| 53 |
|
| 54 |
| Metric | SmolVLA (450M) | **EdgeVLA-Tiny (164M)** | Delta |
|
|
|
|
| 49 |
|
| 50 |
Trained exclusively on [`lerobot/fmb`](https://huggingface.co/datasets/lerobot/fmb) (3-camera Franka Panda manipulation). Source code: [enfuse/edgevla](https://github.com/enfuse/edgevla)
|
| 51 |
|
| 52 |
+
## Intended Use & What You Can Do With This Model
|
| 53 |
+
|
| 54 |
+
**This model predicts 7-DoF robot actions** (x, y, z, rx, ry, rz, gripper) from 3 camera images. It outputs 50-step action chunks at 10Hz — each inference produces 5 seconds of continuous robot motion.
|
| 55 |
+
|
| 56 |
+
**Immediate uses:**
|
| 57 |
+
- **Deploy on a Franka Panda** (or compatible 7-DoF arm) with a 3-camera setup for FMB-style tabletop manipulation. Feed camera frames in, execute the predicted delta actions.
|
| 58 |
+
- **Fine-tune on your own robot data** — this is the most practical use. If you have any robot with cameras in [LeRobot format](https://github.com/huggingface/lerobot), this checkpoint is an excellent pretrained starting point. Fine-tuning at LR=3e-5 for 50K steps typically adapts well to new setups.
|
| 59 |
+
- **Edge deployment** — the smallest model in the EdgeVLA family and likely **the smallest open-source VLA model available**. At 164M params / 313MB FP16, it runs on even the most constrained Jetson devices. Estimated ~142ms on Jetson Orin AGX, ~57ms on H200. Fits on Jetson Orin Nano (8GB).
|
| 60 |
+
- **Real-time control** — at 17.7 Hz throughput on H200, this model can run closed-loop at well above the 10Hz control frequency, enabling reactive manipulation.
|
| 61 |
+
|
| 62 |
+
**Important caveats:**
|
| 63 |
+
- All metrics below are **offline action prediction** on held-out FMB samples. There are **no closed-loop success rate numbers** — the model has not been validated on a physical robot completing full tasks.
|
| 64 |
+
- Trained specifically on FMB data (Franka Panda, specific manipulation tasks, 3-camera setup). It will **not generalize** to different robots, camera configurations, or tasks without fine-tuning.
|
| 65 |
+
- The model expects 3 camera inputs (side_1, side_2, wrist). For single-camera setups, you would need to fine-tune with `--empty_cameras` or retrain.
|
| 66 |
+
- As the smallest variant, Tiny trades some accuracy for speed — rz and x dimensions are slightly worse than SmolVLA (see per-dimension table below).
|
| 67 |
+
|
| 68 |
## Results (FMB Offline, 500 held-out samples)
|
| 69 |
|
| 70 |
| Metric | SmolVLA (450M) | **EdgeVLA-Tiny (164M)** | Delta |
|