SlopCodeBench: Benchmarking How Coding Agents Degrade Over Long-Horizon Iterative Tasks
Paper • 2603.24755 • Published • 27
None defined yet.
SlopCodeBench: Benchmarking How Coding Agents Degrade Over Long-Horizon Iterative Tasks
Understanding Behavior Cloning with Action Quantization