---
tags:
  - hermes
  - navigation
  - indoor-robotics
  - scene-understanding
  - pytorch
  - onnx
license: apache-2.0
---

# HERMES Navigation Model v1

Indoor semantic navigation model combining vision and 3D point cloud understanding.

## Architecture
- **Vision encoder**: CNN backbone (5-layer, 256-dim output)
- **Point cloud encoder**: MLP with max-pooling (2048 points → 256-dim)
- **Fusion**: 512-dim MLP with LayerNorm + Dropout
- **Heads**: Direction (3D unit vector) + Traversability (scalar 0-1)

## Training
- **Dataset**: SUN RGB-D (5,509 indoor scenes)
- **Split**: 90/5/5 (train/val/test)
- **Optimizer**: AdamW (lr=2e-4, cosine schedule)
- **Mixed precision**: bf16 on CUDA

## Formats
| Format | File | Use Case |
|--------|------|----------|
| PyTorch | `pytorch/hermes_nav_v1.pth` | Training/fine-tuning |
| SafeTensors | `pytorch/hermes_nav_v1.safetensors` | Fast safe loading |
| ONNX | `onnx/hermes_nav_v1.onnx` | Cross-platform inference |

## Usage
```python
import torch
from hermes.training.model import HermesNavigationModel

model = HermesNavigationModel()
model.load_state_dict(torch.load("hermes_nav_v1.pth"))
model.eval()

image = torch.randn(1, 3, 256, 256)
points = torch.randn(1, 2048, 3)
output = model(image, points)
# output["direction"]: [1, 3] goal direction
# output["traversability"]: [1, 1] traversability score
```

## Citation
ANIMA Suite — Robot Flow Labs