Benchmark · agentic
BrowseComp
OpenAI's browser-use agent benchmark.
2 results
1 models
Qwen3-4B
Timeline
-
2026-06-18
Qwen3-4B
7.0pts
Data Recipe Boosts Long-Context Reasoning in LLMs
-
2026-06-18
Qwen3-4B
7.0pts
Data Recipe Boosts Long-Context Reasoning in LLMs