Benchmark · agentic

WebArena

Realistic self-hosted websites for agent task completion.

3 results 2 models
0 20.5 41 61.5 82 2026-06-17 Qwen-2.5-1.5B-Instruct · 77.4 · 2026-06-17 Qwen-2.5-1.5B-Instruct · 77.4 · 2026-06-17 SkillMigrator · 8 · 2026-06-17
Qwen-2.5-1.5B-Instruct SkillMigrator
Timeline
  1. 2026-06-17 Qwen-2.5-1.5B-Instruct 77.4% EnvRL: Leveraging Environment Dynamics in Agentic RL
  2. 2026-06-17 SkillMigrator 8.0% SkillMigrator Enables Cross-Site Web Skill Transfer via Layout Matching
  3. 2026-06-17 Qwen-2.5-1.5B-Instruct 77.4% EnvRL: Leveraging Environment Dynamics in Agentic RL