A 300M-parameter SLM fine-tuned on general-domain data achieves 0.83 micro-F1 in general-domain relation extraction, surpassing zero-shot GPT-5.4 and Claude Sonnet 4.6. On literary benchmarks, the SLM reaches 0.92 on the Biographical dataset, outperforming GPT-5.4 and exceeding frontier models on average. These results demonstrate that task-adapted small models can deliver accurate, private, and hardware-efficient performance without relying on large-scale generative models.
arxiv
arXiv cs.CL
·
2d ago
·
src: 4d ago
·
research
Small Language Models Outperform Frontier LLMs in Relation Extraction
from English
Importance 2/3
Beats a top-lab benchmark
arXiv cs.CL
Alibaba (Qwen)
Evaluation & benchmarks
Reasoning models