Cascade-2 cheating attempts on Math is... cute

#25
by chankhavu - opened

I was doing some inference of this model on a subset of harder problems from Nemotron-Math-V2, hoping to get a decent dataset for fun. I noticed that this model created a bunch of unrelated files in the sandbox folder (which is not network isolated in my case foolishly):
image

It downloaded IMO2021SL.pdf, imo2022sl.pdf and so on. I opened those files and they were legitimate PDF files. Which means the model tried to cheat by downloading those files πŸ˜„ Here are the statistics of network access attempts (out of 10K reasoning traces):

Domain Traces
artofproblemsolving.com 64
en.wikipedia.org 58
www.google.com 40
duckduckgo.com 39
oeis.org 28
math.stackexchange.com 13
api.stackexchange.com 12
www.imo-official.org 11
html.duckduckgo.com 11
raw.githubusercontent.com 8
api.duckduckgo.com 6
mathworld.wolfram.com 6
purplecomet.org 5
api.github.com 3
arxiv.org 3
www.bing.com 3
stackoverflow.com 3

The cutest part is that in most of these cheating attempts, it fails to produce the correct answer. Here are in-depth analysis together with exported reasoning traces with tool use: https://huggingface.co/datasets/chankhavu/nemotron-cascade2-cheating-attempts (analysis was done by Claude Code)

Sign up or log in to comment