The authors propose a method for scalable behavior cloning of browser agents by distilling human interaction trajectories into compact natural-language skills. These distilled skills can be read, retrieved, reused, and composed directly by the agent.
- The approach converts user interaction traces into reusable skills, addressing the bottleneck of decision-making under incomplete information.
- Distilled skills are organized into a skill graph to ensure growth through consolidation rather than unbounded accumulation.
- The project aims to leverage collective human browsing skills instead of manually designed tasks.
The work suggests that browser agent scalability stems from reusing existing human interaction patterns. The project is.