A fine-tuned 0.5B-parameter Qwen2.5 model achieves 0.83 micro-F1 in general-domain relation extraction, surpassing zero-shot GPT-5.4 and Claude Sonnet 4.6. On literary benchmarks, it reaches 0.92 on the Biographical dataset, outperforming GPT-5.4 and exceeding frontier models in accuracy, demonstrating that task-adapted small models can deliver high performance with minimal hardware and privacy overhead.
Small Language Models Outperform Frontier LLMs in Relation Extraction
from English