TUDUM adapts Qwen3.5-27B for Turkish reasoning via SFT and RL

The TUDUM project presents a pipeline for adapting the Qwen3.5-27B model to perform explicit reasoning in Turkish, rather than just translating prompts or answers.

The pipeline applies supervised fine-tuning on 15,991 Turkish reasoning examples using LoRA adapters.
It then uses GRPO-family reinforcement learning on a proxy-filtered Turkish mathematics environment.
SFT reduced average response length and thinking exhaustion but lowered benchmark accuracy.
RL recovered some mathematical performance, particularly on AIME24, but did not exceed the base model's Macro-6 average.

The released step-50 model is publicly available as a technically honest evaluation of Turkish-thinking reasoning.