VibeThinker, a 3-billion-parameter language model, outperforms Opus 4.5 in reasoning tasks using a novel SFT+GRPO training approach. The model was introduced in a paper available on arXiv, with details shared in a Reddit post.
VibeThinker: 3B-parameter model beats Opus 4.5 in reasoning
from English