NAVER LABS Europe ties for first at IWSLT 2026 instruction-following speech track

NAVER LABS Europe submits a system to the instruction-following speech processing short track at IWSLT 2026, achieving a tie for first place in the overall ranking. The team developed systems capable of jointly performing ASR, ST, and SQA from English speech into Chinese, Italian, and German.

Replaces the previous speech projector with SpeechMapper, which learns a speech-to-LLM embedding projector using only ASR data.
Introduces fakACL, a synthetic SQA dataset composed of artificially generated scientific presentations built by prompting an LLM backbone and synthesizing speech with SeamlessM4T-large-v2.
The combination of improved speech projection and domain-specific synthetic data allows the model to outperform last year's best system while being more compact and relying on a weaker LLM backbone.

The authors consider this significant because their updated multi-stage training pipeline enables superior performance with reduced resource requirements compared to previous state-of-the-art systems.