🏗️ SiteSense — Model Weights

Real-Time Construction Equipment Monitoring via Aerial Computer Vision

Overview

This repository hosts the trained model weights for SiteSense — a real-time pipeline that detects, tracks, identifies, and classifies the activity of heavy construction equipment from drone/aerial video footage.

The system processes each frame through a multi-phase pipeline:

Video Frame → Detector (RF-DETR or YOLO26-L) → BoT-SORT Tracking → DINOv3 Re-ID → Activity Classification → Kafka Events

Two interchangeable detectors are provided. Switch at runtime via the DETECTOR_TYPE environment variable (rfdetr or yolo) — no rebuild required.

Model Weights

File	Size	Architecture	Task	Notes
`rfdetr_construction.pth`	122 MB	RF-DETR (Real-time Foundation DETR)	8-class object detection	Default — best accuracy, NMS-free set prediction
`yolo26l_construction_v1.pt`	51 MB	YOLO26-L (Ultralytics, 24.8 M params)	8-class object detection	Faster alternative — STAL, NMS-free, ProgLoss
`dinov3_reid_head.pth`	5.4 MB	Linear projection head (1536→256→128)	Equipment re-identification	Trained contrastively on tracked equipment crops
`osnet_x0_25_msmt17.pt`	2.9 MB	OSNet x0.25	Appearance-based ReID for BoT-SORT	MSMT17 (pretrained)

Note: The DINOv3 ViT-B/16 backbone (~327 MB) is not included here. It is auto-downloaded from facebook/dinov3-vitb16-pretrain-lvd1689m on first run using your HF_TOKEN.

Detection Classes

Both detectors are fine-tuned on the same merged MOCS + ACID v2 dataset to recognize 8 classes of construction equipment from aerial perspectives:

ID	Class	ID	Class
0	Excavator	4	Mobile Crane
1	Dump Truck	5	Tower Crane
2	Bulldozer	6	Roller Compactor
3	Wheel Loader	7	Cement Mixer

Training Results

Both detectors were trained on the identical train/val/test split (42,733 / 4,615 / 990 images) for direct comparison. Numbers below are on the held-out val split.

Detector Comparison (val split)

Metric	RF-DETR (default)	YOLO26-L	Δ (RF − YOLO)
mAP@50:95	0.761	0.740	+2.1 pts
mAP@50	0.910	0.905	+0.5 pts
F1 Score	0.886	0.876	+1.0 pts
Precision	0.929	0.924	+0.5 pts
Recall	0.847	0.834	+1.3 pts
FPS (RTX 3050 Ti)	9–10	11–13	YOLO faster

RF-DETR wins on 7 of 8 per-class AP50-95 (only bulldozer goes to YOLO26-L: 0.796 vs 0.785). The largest RF-DETR margins are on the most under-represented classes — mobile_crane (+4.7 pts) and tower_crane (+6.0 pts) — where set-based prediction handles long boom shapes and heavy occlusion better than YOLO's anchor-based head.

Per-class AP@50:95

Class	RF-DETR	YOLO26-L
Excavator	0.811	0.806
Dump Truck	0.675	0.661
Bulldozer	0.785	0.796
Wheel Loader	0.810	0.792
Mobile Crane	0.675	0.628
Tower Crane	0.692	0.632
Roller Compactor	0.838	0.825
Cement Mixer	0.800	0.779

DINOv3 Re-ID Projection Head

Metric	Value
Contrastive Loss	0.0482
Accuracy	96.8%
Embedding Dim	128-d L2-normalized
Training Pairs	~12,000 positive pairs

Quick Start

Option A: Download All Weights (Recommended)

pip install huggingface_hub
huggingface-cli download Zaafan/sitesense-weights --local-dir models/

This pulls all four weight files at once into your models/ directory — both detectors plus both Re-ID heads.

Option B: Python API

from huggingface_hub import hf_hub_download

# Detectors (pick one or both)
hf_hub_download(repo_id="Zaafan/sitesense-weights", filename="rfdetr_construction.pth",     local_dir="models/")
hf_hub_download(repo_id="Zaafan/sitesense-weights", filename="yolo26l_construction_v1.pt",  local_dir="models/")

# Re-ID
hf_hub_download(repo_id="Zaafan/sitesense-weights", filename="dinov3_reid_head.pth",        local_dir="models/")
hf_hub_download(repo_id="Zaafan/sitesense-weights", filename="osnet_x0_25_msmt17.pt",       local_dir="models/")

Option C: Auto-Download (Zero Setup)

The SiteSense pipeline automatically downloads missing weights on first run:

# In services/cv-inference/main.py — resolve_weights() handles this transparently.
# It picks the right file based on DETECTOR_TYPE (yolo or rfdetr).
weights_path = resolve_weights('yolo26l_construction_v1.pt')  # local first, HF fallback

Usage with SiteSense Pipeline

# 1. Clone the repository
git clone https://github.com/Mahmoud-Zaafan/SiteSense.git
cd SiteSense

# 2. Download weights
huggingface-cli download Zaafan/sitesense-weights --local-dir models/

# 3. Configure environment
cp .env.example .env

# 4. Launch infrastructure
docker compose up --build -d

# 5a. Run pipeline with the default detector (YOLO26-L)
docker compose --profile pipeline up cv-inference

# 5b. Or switch to RF-DETR at runtime — no rebuild needed
DETECTOR_TYPE=rfdetr docker compose --profile pipeline up cv-inference

Citation

If you use these weights in your research or projects, please cite:

@misc{sitesense2025,
  author = {Mahmoud Zaafan},
  title  = {SiteSense: Real-Time Construction Equipment Monitoring via Aerial Computer Vision},
  year   = {2025},
  url    = {https://github.com/Mahmoud-Zaafan/SiteSense}
}

License

All weights are released under the MIT License.

Downloads last month: -; Downloads are not tracked for this model. How to track