Project ARGUS — LFM2.5-VL Military Satellite Detection Adapter
Autonomous Reconnaissance & Ground Understanding System Hackathon: Liquid AI x DPhi Space "AI in Space"
Overview
This is a LoRA adapter fine-tuned on top of LiquidAI/LFM2.5-VL-450M for military object detection in satellite imagery. It enables the base VLM to output structured JSON tactical reports directly from overhead reconnaissance images — replacing traditional multi-stage YOLO detection pipelines with a single unified inference pass.
Key Capabilities
| Capability | Description |
|---|---|
| Military Vehicle Detection | Tanks, APCs, trucks, artillery, civilian vehicles |
| Aerial Asset Detection | Aircraft, helicopters, UAVs at airfields |
| Naval Detection | Ships, submarines, harbor installations |
| Infrastructure Analysis | Bridges, storage tanks, port cranes, helipads |
| Threat Assessment | LOW / MEDIUM / HIGH classification per target |
| Tactical Reasoning | Natural language assessment for each detection |
Training Details
- Base Model: LiquidAI/LFM2.5-VL-450M
- Method: QLoRA (4-bit) via Unsloth
- LoRA Config: r=16, alpha=32, all linear layers
- Trainable Parameters: 1,376,256 / 450,095,104 (0.31%)
- Training Data: 3,512 samples (MVRSD military vehicles + DOTA aerial objects)
- Epochs: 3 (1,317 steps)
- Final Loss: 0.4017
- Hardware: NVIDIA T4 GPU
- Training Time: ~60 minutes
Datasets
| Dataset | Samples | Source |
|---|---|---|
| MVRSD | 12 (demo) | Military Vehicle Remote Sensing Dataset |
| DOTA | 3,500 | Large-scale aerial object detection |
Usage
With PEFT (recommended)
from transformers import AutoProcessor, AutoModelForImageTextToText
from peft import PeftModel
from PIL import Image
# Load base + adapter
base = AutoModelForImageTextToText.from_pretrained(
"LiquidAI/LFM2.5-VL-450M",
device_map="auto",
torch_dtype="auto",
)
model = PeftModel.from_pretrained(base, "johnny711/argus-lfm-lora")
model = model.merge_and_unload() # merge for faster inference
processor = AutoProcessor.from_pretrained("LiquidAI/LFM2.5-VL-450M")
# Run detection
image = Image.open("satellite_image.jpg")
prompt = """You are an orbital intelligence analyst examining satellite imagery \
from a defense reconnaissance satellite at ~800 km altitude.
Detect ALL military-relevant objects visible in this image. For each object, provide:
- "label": specific type of object
- "bbox": normalized bounding box [x1, y1, x2, y2] in [0,1]
- "threat_level": "LOW", "MEDIUM", or "HIGH"
- "confidence": 0.0 to 1.0
- "reasoning": brief tactical assessment
Return a JSON array. If no targets visible, return: []"""
messages = [{"role": "user", "content": [
{"type": "image", "image": image},
{"type": "text", "text": prompt},
]}]
inputs = processor.apply_chat_template(
messages, add_generation_prompt=True,
return_tensors="pt", return_dict=True, tokenize=True,
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.1)
new_tokens = outputs[:, inputs["input_ids"].shape[1]:]
result = processor.batch_decode(new_tokens, skip_special_tokens=True)[0]
print(result)
Example Output
[
{
"label": "Small Military Vehicle",
"bbox": [0.0, 0.3438, 0.0645, 0.0664],
"threat_level": "LOW",
"confidence": 0.85,
"reasoning": "Small Military Vehicle detected near tree cover, likely concealed staging area"
},
{
"label": "Naval Vessel",
"bbox": [0.0547, 0.5625, 0.0664, 0.0527],
"threat_level": "HIGH",
"confidence": 0.82,
"reasoning": "Naval Vessel visible in desert terrain, limited concealment"
}
]
Project ARGUS Architecture
Satellite Image (GigaPixel)
|
[Phase 1] LFM2.5-VL + LoRA --> JSON detections (this model)
|
[Phase 2] Depth Anything 3 --> 3D reality check (decoy filtering)
|
[Phase 3] Report Assembly --> Tactical JSON downlink (bytes, not GB)
The Problem: Military satellites capture massive images but have limited downlink bandwidth. Sending gigabytes of raw imagery to ground stations wastes hours.
Our Solution: Run AI at the edge (in orbit). This adapter enables a 450M-parameter VLM to perform unified detection, classification, and tactical reasoning in a single inference pass — producing a tiny JSON report instead of raw imagery.
Developed by
- johnny711 — GitHub
- Hackathon: Liquid AI x DPhi Space "AI in Space"
License
Apache 2.0
This lfm2_vl model was trained 2x faster with Unsloth
Model tree for johnny711/argus-lfm-lora
Base model
LiquidAI/LFM2.5-350M-Base