Benchmark · agentic

BrowseComp

OpenAI's browser-use agent benchmark.

2 results 1 models
0 2 4 6 8 2026-06-18 Qwen3-4B · 7 · 2026-06-18 Qwen3-4B · 7 · 2026-06-18
Qwen3-4B
Timeline
  1. 2026-06-18 Qwen3-4B 7.0pts Data Recipe Boosts Long-Context Reasoning in LLMs
  2. 2026-06-18 Qwen3-4B 7.0pts Data Recipe Boosts Long-Context Reasoning in LLMs