GIS-Coder β€” A Code Model for Geographic Information Systems

A LoRA-adapted code model specialized for GIS and geospatial Python programming. Includes a ready-to-run training package for scaling up to 7B on your own GPU cluster.

πŸ“¦ This Repo Contains

File Description
adapter_model.safetensors Trained LoRA adapter (0.5B base, proof of concept)
train_7b.py Production 7B QLoRA training script with CLI args
evaluate.py Evaluation suite (12 GIS benchmarks with scoring)
requirements.txt All dependencies
TRAINING_README.md Detailed training guide β€” hardware, hyperparameters, ablations

πŸš€ Train the 7B Model on Your GPUs

# 1. Clone this repo
git clone https://huggingface.co/RhodWeo/GIS-Coder-7B
cd GIS-Coder-7B

# 2. Install deps
pip install -r requirements.txt

# 3. Login
huggingface-cli login

# 4. Train! (A100 80GB recommended)
python train_7b.py

# For A10G/RTX 4090 (24GB):
python train_7b.py --batch_size 1 --grad_accum 16 --max_length 2048

# For H100:
python train_7b.py --batch_size 4 --grad_accum 4 --max_length 8192

# 5. Evaluate
python evaluate.py --adapter_id ./gis-coder-7b-output/final --compare_base

See TRAINING_README.md for the full guide with hardware-specific settings, ablation ideas, and expected results.

πŸ—ΊοΈ GIS Libraries Covered (13)

Priority Libraries Coverage
Tier 1 (0% baseline) OSMnx, MovingPandas, Rasterio, GDAL, PyProj Heavy β€” these are where models fail
Tier 2 GeoPandas, Shapely, H3 Core GIS operations
Tier 3 Folium, xarray, PyQGIS, Fiona, PySAL Real-world workflows

πŸ“Š Proof-of-Concept Results (0.5B)

Trained on CPU with the smaller base model to validate the approach:

Metric Start β†’ End
Loss 1.52 β†’ 0.88 (βˆ’42%)
Token Accuracy 69.3% β†’ 79.3% (+10pp)
Eval Quality 85% (code + library + CoT + function)

πŸ”¬ Training Recipe

Based on published research:

Principle Source Applied
QLoRA SFT beats 72B models CFD paper r=32, all-linear, lr=2e-4
Qwen2.5-Coder best backbone MapCoder-Lite Base model selection
Models score 0% on GIS GIS Benchmark Heavy OSMnx/MovingPandas coverage
CoT boosts +20.9% pass@1 CFD paper ablation All examples include CoT
Target all linear layers LoRA Without Regret target_modules="all-linear"

πŸ“š Dataset

RhodWeo/gis-code-instructions β€” 70 expert-curated examples with Chain-of-Thought annotations.

License

Apache 2.0

Downloads last month
61
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for RhodWeo/GIS-Coder-7B

Adapter
(35)
this model

Papers for RhodWeo/GIS-Coder-7B