The TUDUM project presents a pipeline for adapting the Qwen3.5-27B model to perform explicit reasoning in Turkish, rather than just translating prompts or answers.
- The pipeline applies supervised fine-tuning on 15,991 Turkish reasoning examples using LoRA adapters.
- It then uses GRPO-family reinforcement learning on a proxy-filtered Turkish mathematics environment.
- SFT reduced average response length and thinking exhaustion but lowered benchmark accuracy.
- RL recovered some mathematical performance, particularly on AIME24, but did not exceed the base model's Macro-6 average.
The released step-50 model is publicly available as a technically honest evaluation of Turkish-thinking reasoning.