Flystral β€” LoRA Fine-tuned Ministral 3B for Drone Flight Control

LoRA adapter for real-time drone telemetry prediction from camera images, built for the Louise AI Safety Drone Escort system.

What it does

Given a drone camera frame, the model outputs a telemetry vector (velocity, orientation, altitude adjustments) that drives autonomous flight control. This enables the drone to react to visual obstacles and environmental conditions in real-time during pedestrian escort missions.

Training

Parameter Value
Base model mistralai/Ministral-3-3B-Instruct-2512-BF16
Method LoRA (PEFT)
LoRA rank (r) 4
LoRA alpha 8
Target modules q_proj, v_proj
Task type CAUSAL_LM
Steps 500
Learning rate 2e-4
Gradient accumulation 8
Grad clipping 0.3
Precision bfloat16
Hardware Google Colab T4 GPU (15 GB VRAM)
Training time ~35 minutes
PEFT version 0.18.1

Dataset

AirSim RGB+Depth Drone Flight 10K β€” 1,000 RGB frames (320Γ—320) from Microsoft AirSim simulator, each paired with a numpy telemetry array containing velocity/orientation data.

Each training example pairs a drone camera image with a telemetry vector (50 float values) representing the drone's state. The model learns to predict these vectors from visual input.

Training loss

Step  64/500  loss=10.6414
Step 128/500  loss=9.5537
Step 192/500  loss=7.0885
Step 256/500  loss=4.6498
Step 320/500  loss=3.1225
Step 384/500  loss=2.4410
Step 448/500  loss=1.9873
Step 500/500  loss=1.7251

Loss decreased from 10.6 β†’ 1.7 over 500 steps, confirming the adapter learned to map visual features to telemetry predictions.

Usage

import torch
from transformers import AutoProcessor, Mistral3ForConditionalGeneration
from peft import PeftModel
from PIL import Image

processor = AutoProcessor.from_pretrained("mistralai/Ministral-3-3B-Instruct-2512-BF16")
model = Mistral3ForConditionalGeneration.from_pretrained(
    "mistralai/Ministral-3-3B-Instruct-2512-BF16",
    torch_dtype=torch.bfloat16,
)
model = PeftModel.from_pretrained(model, "BenBarr/flystral")
model = model.merge_and_unload().cuda().eval()

img = Image.open("drone_frame.jpg").convert("RGB")

messages = [{"role": "user", "content": [
    {"type": "image"},
    {"type": "text", "text": "Output the raw telemetry for this frame."},
]}]

text = processor.apply_chat_template(messages, add_generation_prompt=True)
inputs = processor(text=text, images=[img], return_tensors="pt").to("cuda")

with torch.no_grad():
    output_ids = model.generate(**inputs, max_new_tokens=200, do_sample=False)

result = processor.decode(output_ids[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True)
print(result)  # Telemetry vector: vx, vy, vz, yaw_rate, ...

Architecture

The adapter sits in the Louise multi-agent drone escort system:

  • Flystral (this model) β€” flight control from camera images
  • Helpstral β€” safety/threat assessment from camera images (Pixtral 12B)
  • Louise β€” conversational safety companion (Ministral 3B)

When the fine-tuned endpoint is available, Flystral uses this adapter. When offline, it falls back to agentic mode on the base Ministral 3B via the Mistral API with function calling.

Developed by

Ben Barrett β€” Mistral Worldwide Hackathon 2026

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for BenBarr/flystral