The TUDUM project presents a pipeline for adapting the Qwen3.5-27B model to perform explicit reasoning in Turkish, rather than just translating prompts or answers.

  • The pipeline applies supervised fine-tuning on 15,991 Turkish reasoning examples using LoRA adapters.
  • It then uses GRPO-family reinforcement learning on a proxy-filtered Turkish mathematics environment.
  • SFT reduced average response length and thinking exhaustion but lowered benchmark accuracy.
  • RL recovered some mathematical performance, particularly on AIME24, but did not exceed the base model's Macro-6 average.

The released step-50 model is publicly available as a technically honest evaluation of Turkish-thinking reasoning.