Abacus-cve-v1.1

Abacus-cve-v1.1 is an iterative version of Abacus-cve, fine-tuned on an expanded dataset for security vulnerability fixing tasks.

What's New in v1.1

Compared to Abacus-cve, this version is trained on an expanded dataset:

+3k agentic task in cve_tasks_3k_compressed
18.8k total training samples (up from ~4k in v1.0)
Training dataset: Luoberta/cve_train_v1.1

Model Description

Abacus-cve-v1.1 is based on Qwen3-32B and fine-tuned from scratch using 18.8k distilled agent traces from CVE reproduction tasks. The traces were generated using Claude Opus 4.5 with a Mini SWE-Agent harness through the CVE-Factory pipeline.

Training Results

Evaluated on LiveCVEBench-verified and PatchEval-verified with temperature=0.6, avg@5:

Model	LiveCVEBench	PatchEval	Terminal-Bench-2.0	Avg
Qwen3-32B (base)	8.96 ± 1.75	5.64 ± 1.37	5.41 ± 1.70	6.67
Abacus-cve (v1.0)	36.50 ± 1.52	21.94 ± 1.46	20.14 ± 2.68	26.19
Abacus-cve-v1.1 (Ours)	40.33 ± 1.36	24.32 ± 0.76	21.57 ± 1.67	28.74

Qwen3-Coder-30B	11.29 ± 1.36	9.25 ± 0.95	11.01 ± 2.43	10.51
Qwen3-Coder-480B	29.14 ± 0.26	18.06 ± 0.72	25.17 ± 2.04	24.12
MiniMax-M2	40.44 ± 1.42	25.11 ± 0.92	48.31 ± 2.44	31.28
Kimi-K2.5	44.48 ± 1.32	32.07 ± 1.40	41.44 ± 3.12	39.33
GPT-5.4	40.98 ± 1.62	32.95 ± 0.85	32.81 ± 2.16	35.58
Claude Sonnet 4	34.79 ± 0.83	24.76 ± 1.98	26.52 ± 2.59	28.69
Claude Sonnet 4.5	44.92 ± 2.71	29.16 ± 1.46	41.35 ± 1.38	38.47
Claude Opus 4.5	51.58 ± 1.64	35.68 ± 1.00	60.67 ± 2.50	49.31

Key findings:

v1.1 vs v1.0: +3.83 on LiveCVEBench, +2.38 on PatchEval, +1.43 on Terminal-Bench-2.0
Scaling potential: Performance gains from 4k to 18.8k traces demonstrate continued improvement with more data, suggesting further scaling could yield additional gains
Competitive performance: Matches Claude Sonnet 4 level on security tasks with a 32B model

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("Luoberta/Abacus-cve-v1.1")
tokenizer = AutoTokenizer.from_pretrained("Luoberta/Abacus-cve-v1.1")

Related Resources

Abacus-cve (v1.0) - Original version
Leaderboard - Live rankings on LiveCVEBench
LiveCVEBench - Security vulnerability benchmark
CVE-Factory - The multi-agent system that generated training traces
cve_train_v1.1 Dataset - Training data (18.8k agent traces)

Citation

@misc{luo2026cvefactory,
  title={CVE-Factory: Scaling Expert-Level Agentic Tasks for Code Security Vulnerability},
  author={Xianzhen Luo and Jingyuan Zhang and Shiqi Zhou and Rain Huang and Chuan Xiao and Qingfu Zhu and Zhiyuan Ma and Xing Yue and Yang Yue and Wencong Zeng and Wanxiang Che},
  year={2026},
  eprint={2602.03012},
  archivePrefix={arXiv},
  primaryClass={cs.CR},
  url={https://arxiv.org/abs/2602.03012}
}

Downloads last month: 842

Safetensors

Model size

677k params

Tensor type

BF16

Model tree for Luoberta/Abacus-cve-v1.1

Base model

Qwen/Qwen3-32B

Finetuned

(445)

this model

Quantizations

1 model

Dataset used to train Luoberta/Abacus-cve-v1.1

Paper for Luoberta/Abacus-cve-v1.1

CVE-Factory: Scaling Expert-Level Agentic Tasks for Code Security Vulnerability

Paper • 2602.03012 • Published Feb 3 • 2