VibeThinker, a 3-billion-parameter language model, outperforms Opus 4.5 in reasoning tasks using a novel SFT+GRPO training approach. The model was introduced in a paper available on arXiv, with details shared in a Reddit post.