Qwen3.6-27B-LoRA-fermipy

A LoRA adapter that fine-tunes Qwen/Qwen3.6-27B-FP8 to generate FermiPy YAML configurations and Python analysis scripts from natural-language descriptions of Fermi-LAT gamma-ray analyses.

The adapter is part of the Fermi-LLM project — natural-language interfaces for Fermi Large Area Telescope data analysis.

Quick start with vLLM

vllm serve Qwen/Qwen3.6-27B-FP8 \
    --enable-lora \
    --max-loras 1 \
    --max-lora-rank 32 \
    --lora-modules fermipy=ai4helab/Qwen3.6-27B-LoRA-fermipy \
    --port 8000

from openai import OpenAI

SYSTEM_PROMPT = """You are a FermiPy expert. Given a natural-language description of a Fermi-LAT gamma-ray analysis, generate two artifacts:
1. A YAML configuration file for FermiPy's GTAnalysis.
2. A Python script that uses the FermiPy API to perform the analysis.

Return EXACTLY one fenced block of each, in this order, with no extra prose:

### YAML Configuration:
```yaml
<yaml>

Python Script:

<python>
```"""

client = OpenAI(base_url="http://localhost:8000/v1", api_key="EMPTY")
resp = client.chat.completions.create(
    model="fermipy",
    messages=[
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": (
            "Generate the FermiPy YAML configuration and Python script "
            "for the following analysis:\n\n"
            "Perform a spectral analysis of Mrk 421 (4FGL J1104.4+3812) "
            "between 1 GeV and 1 TeV using SOURCE-class events."
        )},
    ],
    temperature=0.1,
    top_p=0.95,
    max_tokens=4096,
)
print(resp.choices[0].message.content)

The model output is a single assistant turn that always opens with ### YAML Configuration: and a fenced YAML block, followed by ### Python Script: and a fenced Python block. Downstream parsers should look for those fences.

Quick start with PEFT (Python)

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

base_id    = "Qwen/Qwen3.6-27B-FP8"
adapter_id = "ai4helab/Qwen3.6-27B-LoRA-fermipy"

tokenizer = AutoTokenizer.from_pretrained(base_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    base_id, trust_remote_code=True, device_map="cuda:0", torch_dtype="auto"
)
model = PeftModel.from_pretrained(model, adapter_id)
model.eval()

What this adapter does (and doesn't)

Does

Generate a syntactically valid FermiPy YAML configuration with the right sections (logging, fileio, data, binning, selection, gtlike, model, plus optional analysis sections like sed, lightcurve).
Generate a matching Python script that calls the FermiPy API in the right order (GTAnalysis(config) → setup() → optimize() → fit() → analysis-specific calls like sed(), lightcurve()).
Pick reasonable defaults (event class, IRFs, isotropic/galactic diffuse models) when the prompt doesn't specify them.
Map natural-language source aliases (e.g. "Mrk 421", "Crab Nebula", "Vela") to 4FGL catalog identifiers.

Does not

Validate that the configuration will actually run on a given dataset. Path placeholders (<path_to_evfile>) are left intact when the user prompt doesn't supply real paths.
Replace domain expertise. Outputs should be reviewed by an astrophysicist before scientific use.
Generate analyses outside the Fermi-LAT / FermiPy ecosystem.

Training data

The training set is 118 prompt → (yaml, script) pairs (114 used after filtering empty fields):

103 examples generated by larger LLMs and curated by humans (a Fermi-LAT astrophysicist reviewed each one for correctness). Cover diverse analysis types: spectral fits, light curves, SEDs, TS maps, pulsar timing, extension fitting, ROI optimization.
15 examples written entirely by humans — the high-trust polish set.

Each example is rendered through the model's chat template with a fixed system prompt (the one shown in the Quick start section). Assistant target = the canonical ### YAML Configuration: + ### Python Script: format.

Training procedure

A two-phase curriculum:

Phase	Data	Examples	Epochs	LR	Effective batch	Wall time
1	LLM-curated	100	6	1e-4	8 (bs=1, grad-acc=8)	30 min
2	Human-written	14	4	2e-5	8	3 min

Optimizer: AdamW (weight_decay=0.01)
Schedule: cosine, warmup_ratio=0.1 for phase 1; no warmup for phase 2
Sequence length: 4096 (median example: 955 tokens; max: 1783)
Gradient checkpointing: on (use_reentrant=False)
Precision: bf16 LoRA weights

LoRA configuration

Hyperparameter	Value
`r`	32
`lora_alpha`	64
`lora_dropout`	0.05
`bias`	none
`task_type`	CAUSAL_LM

Target modules (10 projections per layer, 64 transformer layers — Qwen3.6-27B has a hybrid architecture):

16 standard self-attention layers: q_proj, k_proj, v_proj, o_proj
48 linear-attention (Mamba-style) layers: in_proj_qkv, in_proj_z, out_proj
All 64 MLP layers: gate_proj, up_proj, down_proj

Trainable parameters: ~410 M, ≈ 1.5 % of the 27 B base.

Loss curve

Phase 1 final training loss (averaged across logged steps): 0.41. Eval loss on a 2-example held-out split fell from 0.27 (epoch 1) to 0.24 (epoch 4). Phase 2 final loss: 0.23.

Full log history is in training_metrics.json in this repo.

Evaluation

The held-out test set is the standardized 3-target benchmark used across the Fermi-LLM project:

Mrk 421 (4FGL J1104.4+3812) — blazar
Vela (4FGL J0835.3-4510) — pulsar
Crab (4FGL J0534.5+2200) — pulsar wind nebula

These targets and their configurations are never seen during training. The adapter's outputs are scored using the project's metric suite (YAML validity, Python syntax validity, FermiPy YAML key coverage, FermiPy API call coverage, parameter accuracy against reference 4FGL values).

Hardware

Training: 1× NVIDIA H100 80 GB. Peak GPU memory ≈ 74 GB. Total wall time (excluding the one-time FP8→bf16 conversion of the base for training): ~12 minutes.
Inference: 1× H100 80 GB suffices for vLLM serving the FP8 base + this LoRA adapter, with ~50 GB free for KV cache.

Notes on the FP8 base

The Qwen3.6-27B-FP8 release uses DeepSeek-style block-wise FP8 quantization (128×128 blocks of float8_e4m3fn with bf16 weight_scale_inv factors). Training was done on a one-off bf16 dequantization of this same checkpoint because:

PEFT's prepare_model_for_kbit_training casts all bf16 params to fp32, which OOMs on a single 80 GB GPU.
The fine-grained FP8 path in transformers requires deep-gemm kernels that aren't fully packaged for every cluster.

The trained LoRA weights are dtype-independent and load cleanly on top of the original FP8 base — there is no quality loss from the bf16 training side because the base never had its weights modified, only the small LoRA adapter.

License

Apache 2.0, inherited from the base model.

Citation

If you use this adapter, please cite the parent project. (Citation block to be added once the paper is published.)

Downloads last month: 11

Model tree for haielab/Qwen3.6-27B-LoRA-fermipy

Base model

Qwen/Qwen3.6-27B

Quantized

Qwen/Qwen3.6-27B-FP8

Adapter

(2)

this model