GIS-Coder — A Code Model for Geographic Information Systems

A LoRA-adapted code model specialized for GIS and geospatial Python programming. Includes a ready-to-run training package for scaling up to 7B on your own GPU cluster.

📦 This Repo Contains

File	Description
`adapter_model.safetensors`	Trained LoRA adapter (0.5B base, proof of concept)
`train_7b.py`	Production 7B QLoRA training script with CLI args
`evaluate.py`	Evaluation suite (12 GIS benchmarks with scoring)
`requirements.txt`	All dependencies
`TRAINING_README.md`	Detailed training guide — hardware, hyperparameters, ablations

🚀 Train the 7B Model on Your GPUs

# 1. Clone this repo
git clone https://huggingface.co/RhodWeo/GIS-Coder-7B
cd GIS-Coder-7B

# 2. Install deps
pip install -r requirements.txt

# 3. Login
huggingface-cli login

# 4. Train! (A100 80GB recommended)
python train_7b.py

# For A10G/RTX 4090 (24GB):
python train_7b.py --batch_size 1 --grad_accum 16 --max_length 2048

# For H100:
python train_7b.py --batch_size 4 --grad_accum 4 --max_length 8192

# 5. Evaluate
python evaluate.py --adapter_id ./gis-coder-7b-output/final --compare_base

See TRAINING_README.md for the full guide with hardware-specific settings, ablation ideas, and expected results.

🗺️ GIS Libraries Covered (13)

Priority	Libraries	Coverage
Tier 1 (0% baseline)	OSMnx, MovingPandas, Rasterio, GDAL, PyProj	Heavy — these are where models fail
Tier 2	GeoPandas, Shapely, H3	Core GIS operations
Tier 3	Folium, xarray, PyQGIS, Fiona, PySAL	Real-world workflows

📊 Proof-of-Concept Results (0.5B)

Trained on CPU with the smaller base model to validate the approach:

Metric	Start → End
Loss	1.52 → 0.88 (−42%)
Token Accuracy	69.3% → 79.3% (+10pp)
Eval Quality	85% (code + library + CoT + function)

🔬 Training Recipe

Based on published research:

Principle	Source	Applied
QLoRA SFT beats 72B models	CFD paper	r=32, all-linear, lr=2e-4
Qwen2.5-Coder best backbone	MapCoder-Lite	Base model selection
Models score 0% on GIS	GIS Benchmark	Heavy OSMnx/MovingPandas coverage
CoT boosts +20.9% pass@1	CFD paper ablation	All examples include CoT
Target all linear layers	LoRA Without Regret	`target_modules="all-linear"`