HapticVLA: Contact-Rich Manipulation via Vision-Language-Action Model without Inference-Time Tactile Sensing
Paper • 2603.15257 • Published
Fine-tuned SmolVLA for left-arm (6-DOF) manipulation on the Crab bimanual mobile manipulator.
lerobot/smolvla_base (450M params)import torch
checkpoint = torch.load("best/model.pt", map_location="cpu")
See Advanced-Robotic-Manipulation/crab for full inference pipeline.
Training configuration is provided in config.yaml. Key settings:
If you use this model, please cite our paper:
@article{gubernatorov2026hapticvla,
title={HapticVLA: Contact-Rich Manipulation via Vision-Language-Action Model without Inference-Time Tactile Sensing},
author={Gubernatorov, Konstantin and Sannikov, Mikhail and Mikhalchuk, Ilya and Kuznetsov, Egor and Artemov, Makar and Ouwatobi, Ogunwoye Faith and Fernando, Marcelino and Asanov, Artem and Guo, Ziang and Tsetserukou, Dzmitry},
journal={arXiv preprint arXiv:2603.15257},
year={2026}
}
Base model
lerobot/smolvla_base