MiniMax-M2.5 / .eval_results /swe_bench_verified.yaml
yuanhe134's picture
Add evaluation results on SWE-Bench Verified (#42)
1825d90
raw
history blame contribute delete
222 Bytes
- dataset:
id: SWE-bench/SWE-bench_Verified
task_id: swe_bench_%_resolved
value: 75.80
source:
url: https://www.swebench.com/
name: SWE-Bench official evaluation
user: nielsr
notes: high reasoning