Small Language Models Outperform Frontier LLMs in Relation Extraction

A 300M-parameter SLM fine-tuned on general-domain data achieves 0.83 micro-F1 in general-domain relation extraction, surpassing zero-shot GPT-5.4 and Claude Sonnet 4.6. On literary benchmarks, the SLM reaches 0.92 on the Biographical dataset, outperforming GPT-5.4 and exceeding frontier models on average. These results demonstrate that task-adapted small models can deliver accurate, private, and hardware-efficient performance without relying on large-scale generative models.

Benchmark	Model	Score
SWE-bench	Qwen2.5-0.5B	0.83pts
SWE-bench	GPT-5.4	0.69pts
SWE-bench	Claude Sonnet 4.6	0.66pts

Benchmarks