cahlen commited on
Commit
3f4b1a3
·
verified ·
1 Parent(s): 431713a

Update model card: concise, consistent, add GitHub link

Browse files
Files changed (1) hide show
  1. README.md +16 -0
README.md CHANGED
@@ -49,6 +49,22 @@ EdgeVLA-Tiny pushes the EdgeVLA architecture to its smallest configuration: Fast
49
 
50
  Trained exclusively on [`lerobot/fmb`](https://huggingface.co/datasets/lerobot/fmb) (3-camera Franka Panda manipulation). Source code: [enfuse/edgevla](https://github.com/enfuse/edgevla)
51
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
52
  ## Results (FMB Offline, 500 held-out samples)
53
 
54
  | Metric | SmolVLA (450M) | **EdgeVLA-Tiny (164M)** | Delta |
 
49
 
50
  Trained exclusively on [`lerobot/fmb`](https://huggingface.co/datasets/lerobot/fmb) (3-camera Franka Panda manipulation). Source code: [enfuse/edgevla](https://github.com/enfuse/edgevla)
51
 
52
+ ## Intended Use & What You Can Do With This Model
53
+
54
+ **This model predicts 7-DoF robot actions** (x, y, z, rx, ry, rz, gripper) from 3 camera images. It outputs 50-step action chunks at 10Hz — each inference produces 5 seconds of continuous robot motion.
55
+
56
+ **Immediate uses:**
57
+ - **Deploy on a Franka Panda** (or compatible 7-DoF arm) with a 3-camera setup for FMB-style tabletop manipulation. Feed camera frames in, execute the predicted delta actions.
58
+ - **Fine-tune on your own robot data** — this is the most practical use. If you have any robot with cameras in [LeRobot format](https://github.com/huggingface/lerobot), this checkpoint is an excellent pretrained starting point. Fine-tuning at LR=3e-5 for 50K steps typically adapts well to new setups.
59
+ - **Edge deployment** — the smallest model in the EdgeVLA family and likely **the smallest open-source VLA model available**. At 164M params / 313MB FP16, it runs on even the most constrained Jetson devices. Estimated ~142ms on Jetson Orin AGX, ~57ms on H200. Fits on Jetson Orin Nano (8GB).
60
+ - **Real-time control** — at 17.7 Hz throughput on H200, this model can run closed-loop at well above the 10Hz control frequency, enabling reactive manipulation.
61
+
62
+ **Important caveats:**
63
+ - All metrics below are **offline action prediction** on held-out FMB samples. There are **no closed-loop success rate numbers** — the model has not been validated on a physical robot completing full tasks.
64
+ - Trained specifically on FMB data (Franka Panda, specific manipulation tasks, 3-camera setup). It will **not generalize** to different robots, camera configurations, or tasks without fine-tuning.
65
+ - The model expects 3 camera inputs (side_1, side_2, wrist). For single-camera setups, you would need to fine-tune with `--empty_cameras` or retrain.
66
+ - As the smallest variant, Tiny trades some accuracy for speed — rz and x dimensions are slightly worse than SmolVLA (see per-dimension table below).
67
+
68
  ## Results (FMB Offline, 500 held-out samples)
69
 
70
  | Metric | SmolVLA (450M) | **EdgeVLA-Tiny (164M)** | Delta |