enfuse
/

edgevla-tiny-fmb

@@ -49,6 +49,22 @@ EdgeVLA-Tiny pushes the EdgeVLA architecture to its smallest configuration: Fast
 Trained exclusively on [`lerobot/fmb`](https://huggingface.co/datasets/lerobot/fmb) (3-camera Franka Panda manipulation). Source code: [enfuse/edgevla](https://github.com/enfuse/edgevla)
 ## Results (FMB Offline, 500 held-out samples)
 | Metric | SmolVLA (450M) | **EdgeVLA-Tiny (164M)** | Delta |

 Trained exclusively on [`lerobot/fmb`](https://huggingface.co/datasets/lerobot/fmb) (3-camera Franka Panda manipulation). Source code: [enfuse/edgevla](https://github.com/enfuse/edgevla)
+## Intended Use & What You Can Do With This Model
+**This model predicts 7-DoF robot actions** (x, y, z, rx, ry, rz, gripper) from 3 camera images. It outputs 50-step action chunks at 10Hz — each inference produces 5 seconds of continuous robot motion.
+**Immediate uses:**
+- **Deploy on a Franka Panda** (or compatible 7-DoF arm) with a 3-camera setup for FMB-style tabletop manipulation. Feed camera frames in, execute the predicted delta actions.
+- **Fine-tune on your own robot data** — this is the most practical use. If you have any robot with cameras in [LeRobot format](https://github.com/huggingface/lerobot), this checkpoint is an excellent pretrained starting point. Fine-tuning at LR=3e-5 for 50K steps typically adapts well to new setups.
+- **Edge deployment** — the smallest model in the EdgeVLA family and likely **the smallest open-source VLA model available**. At 164M params / 313MB FP16, it runs on even the most constrained Jetson devices. Estimated ~142ms on Jetson Orin AGX, ~57ms on H200. Fits on Jetson Orin Nano (8GB).
+- **Real-time control** — at 17.7 Hz throughput on H200, this model can run closed-loop at well above the 10Hz control frequency, enabling reactive manipulation.
+**Important caveats:**
+- All metrics below are **offline action prediction** on held-out FMB samples. There are **no closed-loop success rate numbers** — the model has not been validated on a physical robot completing full tasks.
+- Trained specifically on FMB data (Franka Panda, specific manipulation tasks, 3-camera setup). It will **not generalize** to different robots, camera configurations, or tasks without fine-tuning.
+- The model expects 3 camera inputs (side_1, side_2, wrist). For single-camera setups, you would need to fine-tune with `--empty_cameras` or retrain.
+- As the smallest variant, Tiny trades some accuracy for speed — rz and x dimensions are slightly worse than SmolVLA (see per-dimension table below).
 ## Results (FMB Offline, 500 held-out samples)
 | Metric | SmolVLA (450M) | **EdgeVLA-Tiny (164M)** | Delta |