The authors present DistilledGemma, an efficient system for person-place relation extraction from multilingual historical newspaper articles in English, German, and French. The approach utilizes a three-stage knowledge distillation pipeline to balance classification accuracy with computational efficiency.

  • A first stage explored prompt engineering across eight large language models to identify the most effective reasoning architecture.
  • A second stage applied supervised fine-tuning via QLoRA to a Gemma 4 26B teacher model to generate silver-standard chain-of-thought traces.
  • A final stage performed response-level distillation to transfer reasoning patterns into a compact Gemma 4 E2B student model with approximately 2.3B effective parameters.
  • The team ranked 3rd on the standard test set (0.688 accuracy profile mean) and 2nd on the binary test set (0.8156 mean score).
  • The configuration ranked 2nd in the balanced efficiency-accuracy profile across both test sets by merging LoRA adapters for inference.

These results demonstrate that knowledge distillation provides a practical and scalable solution for historical document processing, achieving competitive performance without excessive computational cost.