Model "thinks" for too long
they acknowledged its overthinking issue and they said they are working to fix it without the loss of performance . i hope its not going to take long cause it seems like the overthinking issue makes it sometimes unusable , specially in some riddles or problems it took over 25 minutes thinking which is overkill .
Hope they fix it soon, the performance is actually good for 3b model and I really like to use it but this issue needs to be resolved!
Let’s join first; winning is just a matter of time.
Did you try these parameters --temp 0.6 --top-p 0.95 --top-k 40 --min-p 0.01
From my observations, these might be best, and may help with thinking time
Did you try these parameters --temp 0.6 --top-p 0.95 --top-k 40 --min-p 0.01
From my observations, these might be best, and may help with thinking time
I’ve tried first two but not the last two. I’ll give it a try. Thanks
Did you try these parameters --temp 0.6 --top-p 0.95 --top-k 40 --min-p 0.01
From my observations, these might be best, and may help with thinking timeI’ve tried first two but not the last two. I’ll give it a try. Thanks
Setting top_k to 0 indeed shortens the thought chain.
Setting top_k to 0 indeed shortens the thought chain.
Ok, but what is more important, COT or output quality? When i was testing top_k, anything that is not 40 produced worse results, mostly code. (same with min_p 0.01)
Setting top_k to 0 indeed shortens the thought chain.
Ok, but what is more important, COT or output quality? When i was testing top_k, anything that is not 40 produced worse results, mostly code. (same with min_p 0.01)
However, 40 is just the default value in llama.cpp; there's nothing magical about it, nor any specific reason to stick with it. Top_p alone should be sufficient to control token selection, and setting min_p to 0.01 likely won't make a difference.
Did you try these parameters --temp 0.6 --top-p 0.95 --top-k 40 --min-p 0.01
From my observations, these might be best, and may help with thinking timeI’ve tried first two but not the last two. I’ll give it a try. Thanks
Setting top_k to 0 indeed shortens the thought chain.
where you run it what the program at your screen?
Did you try these parameters --temp 0.6 --top-p 0.95 --top-k 40 --min-p 0.01
From my observations, these might be best, and may help with thinking timeI’ve tried first two but not the last two. I’ll give it a try. Thanks
Setting top_k to 0 indeed shortens the thought chain.
where you run it what the program at your screen?
Jan by llama.cpp, Nanbeige4.1-3B-heretic Q6_K GGUF, metal, no prompt, f16 K q8_0 V cache.
top k is indeed important value for LLMs. K value is analogous to a light source that LLMs use on their dark knowledge base, more k value means more light source channels are exposed on the dark knowledge of LLM, so higher chance to get info related to user asked for from its dark interconnected knowledge base. Low K value means LLM is completely dark and high K value means it is getting exposed to too much knowledge. Too much knowledge can cause knowledge paralysis to the model, so only it is nice have k = 20 or k = 40, so model only gets exposed to adequate info from its dark knowledge to give us accurate and sensible answers.
I read lot of research papers related to AI for fun, this is what I understand of the purpose of K value.

