You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Hi, I built a small
llama.cpppatchset for more graceful reasoning-length control.Instead of hard-cutting reasoning at token N, this patch progressively increases the logit of the
</think>token as thinking gets longer.What it adds:
think_controlJSON in the OpenAI-compatible request bodyThis was motivated by the same kind of problems discussed in:
Repository:
https://github.com/tls5657/think-control
If there is interest, I can also share benchmark results and tuning notes for Qwen-style reasoning models and VLM + structured-output workloads.
Beta Was this translation helpful? Give feedback.
All reactions