Researchers introduce STEB to standardize style embedding evaluation

The authors introduce the Style Text Embedding Benchmark (STEB), a comprehensive open-source benchmark designed to standardize the evaluation of style embeddings, which have previously been assessed using fragmented and inconsistent methods.

STEB encompasses 96 datasets across 7 languages.
The benchmark covers applications such as authorship verification, authorship retrieval, AI-text detection, and probing of linguistic features.
Evaluation results show that semantic embeddings consistently fail in stylistic tasks.
No single style embedding is universally superior across all evaluated tasks.

STEB aims to provide a unified framework for assessing style embeddings, addressing the lack of standardized evaluation metrics in the field.