# Default settings (recommended for A100 80GB)
python train_7b.py

# A10G / RTX 4090 (24GB) — reduce batch size
python train_7b.py --batch_size 1 --grad_accum 16 --max_length 2048

# H100 — can afford larger batch and sequence length
python train_7b.py --batch_size 4 --grad_accum 4 --max_length 8192

# Full precision LoRA (no quantization, needs ~30GB)
python train_7b.py --no_quantize --batch_size 1

# With Flash Attention (faster, needs flash-attn installed)
python train_7b.py --use_flash_attn

# With Trackio monitoring
python train_7b.py --use_trackio --trackio_project my-gis-coder

4. Multi-GPU

accelerate launch --num_processes 2 train_7b.py --batch_size 2 --grad_accum 4

5. Evaluate

# Evaluate fine-tuned model
python evaluate.py --adapter_id RhodWeo/GIS-Coder-7B

# Compare with base model
python evaluate.py --adapter_id RhodWeo/GIS-Coder-7B --compare_base

# Evaluate local checkpoint
python evaluate.py --adapter_id ./gis-coder-7b-output/final

⚙️ Hyperparameter Guide

Recommended defaults (battle-tested recipe):

Parameter	Value	Source
`--lr`	`2e-4`	LoRA Without Regret (10× base SFT rate)
`--lora_r`	`32`	MapCoder-Lite optimal for code tasks
`--lora_alpha`	`16`	α/r = 0.5
`--target_modules`	`all-linear`	LoRA Without Regret
`--epochs`	`3`	CFD paper: peak at epoch 2, decline after 4
`--scheduler`	`cosine`	Standard for LoRA
`--warmup_ratio`	`0.1`	CFD paper: 10% warmup
`--max_length`	`4096`	Covers longest GIS code examples

Hardware-specific settings:

GPU	VRAM	`--batch_size`	`--grad_accum`	`--max_length`	Notes
RTX 3090	24GB	1	16	2048	QLoRA only
RTX 4090	24GB	1	16	2048	QLoRA, slightly faster
A10G	24GB	1	16	2048	QLoRA only
L40S	48GB	2	8	4096	QLoRA or LoRA
A100 40GB	40GB	2	8	4096	Recommended minimum
A100 80GB	80GB	2	8	4096	Ideal
H100	80GB	4	4	8192	Fastest

Ablation ideas:

# Higher LoRA rank (more capacity, slower)
python train_7b.py --lora_r 64 --lora_alpha 32

# Lower learning rate (more stable, slower convergence)
python train_7b.py --lr 5e-5

# More epochs (risk overfitting on 70 examples)
python train_7b.py --epochs 5

# Target only attention layers (fewer params, faster)
python train_7b.py --target_modules q_proj,k_proj,v_proj,o_proj

📊 Expected Results

From our CPU training run with 0.5B base model (70 examples, 3 epochs):

Metric	Start → End
Loss	1.52 → 0.88 (−42%)
Token accuracy	69% → 79%
Eval quality score	85%

With the 7B model + QLoRA, expect significantly better results — the CFD paper achieved 88.7% accuracy with this exact recipe on a similarly-sized domain-specific dataset.

📚 Dataset Details

70 examples covering 13 GIS Python libraries:

Library	Examples	Why Important
OSMnx	9	All models score 0% — routing, POIs, isochrones
Rasterio	9	Satellite imagery, DEM, NDVI, reprojection
GeoPandas	25	Core: spatial joins, buffering, I/O
Shapely	14	Geometry operations, validation
MovingPandas	3	All models score 0% — GPS trajectories
GDAL	6	Raster processing, format conversion
PyProj	2	CRS handling (critical weakness)
H3	2	Hexagonal indexing
Folium	1	Interactive maps
Fiona	2	Low-level vector I/O
xarray	1	Climate/raster datacubes
PyQGIS	1	Desktop GIS scripting
PySAL	1	Spatial statistics

Each example includes:

System prompt establishing GIS expertise
Natural language instruction
Step-by-step Chain-of-Thought reasoning
Complete, documented Python code
Key points explaining design decisions

🔬 Scaling to 20K+ Examples

To maximize quality, use the OSS-Instruct pattern (from Magicoder):

Crawl GitHub for GIS Python code (import geopandas, import rasterio, etc.)
Use GPT-4o to generate (instruction, solution) pairs from real code snippets
Execute and test all generated solutions
Add CoT annotations to passing examples (+20.9% pass@1 per CFD paper)

Target: 20K–75K examples for production-grade GIS-Coder.

📖 References

Paper	Key Insight
CFD Fine-tuning	QLoRA SFT recipe: 7B model beats 72B on domain tasks
MapCoder-Lite	Qwen2.5-Coder-7B best backbone for code LoRA
GIS Benchmark	All models score 0% on OSMNX/MovingPandas
Magicoder	OSS-Instruct for synthetic data from real code
LoRA Without Regret	target all-linear, r=64-256, lr=2e-4