Text Generation
Transformers
Safetensors
step3p5
conversational
custom_code
Eval Results
Step-3.5-Flash / .eval_results /terminal_bench_2.yaml
hzwer's picture
Add evaluation results from Step 3.5 Flash paper
ab446a3
raw
history blame contribute delete
223 Bytes
- dataset:
id: harborframework/terminal-bench-2.0
task_id: terminalbench_2
value: 51.0
date: '2026-02-11'
source:
url: https://arxiv.org/abs/2602.10604
name: Step 3.5 Flash Paper
user: SaylorTwift