Benchmark · agentic

GAIA

General AI Assistant benchmark from Meta/HF.

2 results 1 models
0 1.5 3 4.5 6 2026-06-18 Qwen3-4B · 4.8 · 2026-06-18 Qwen3-4B · 4.8 · 2026-06-18
Qwen3-4B
Timeline
  1. 2026-06-18 Qwen3-4B 4.8pts Data Recipe Boosts Long-Context Reasoning in LLMs
  2. 2026-06-18 Qwen3-4B 4.8pts Data Recipe Boosts Long-Context Reasoning in LLMs