--- tags: - hermes - navigation - indoor-robotics - scene-understanding - pytorch - onnx license: apache-2.0 --- # HERMES Navigation Model v1 Indoor semantic navigation model combining vision and 3D point cloud understanding. ## Architecture - **Vision encoder**: CNN backbone (5-layer, 256-dim output) - **Point cloud encoder**: MLP with max-pooling (2048 points → 256-dim) - **Fusion**: 512-dim MLP with LayerNorm + Dropout - **Heads**: Direction (3D unit vector) + Traversability (scalar 0-1) ## Training - **Dataset**: SUN RGB-D (5,509 indoor scenes) - **Split**: 90/5/5 (train/val/test) - **Optimizer**: AdamW (lr=2e-4, cosine schedule) - **Mixed precision**: bf16 on CUDA ## Formats | Format | File | Use Case | |--------|------|----------| | PyTorch | `pytorch/hermes_nav_v1.pth` | Training/fine-tuning | | SafeTensors | `pytorch/hermes_nav_v1.safetensors` | Fast safe loading | | ONNX | `onnx/hermes_nav_v1.onnx` | Cross-platform inference | ## Usage ```python import torch from hermes.training.model import HermesNavigationModel model = HermesNavigationModel() model.load_state_dict(torch.load("hermes_nav_v1.pth")) model.eval() image = torch.randn(1, 3, 256, 256) points = torch.randn(1, 2048, 3) output = model(image, points) # output["direction"]: [1, 3] goal direction # output["traversability"]: [1, 1] traversability score ``` ## Citation ANIMA Suite — Robot Flow Labs