DigitalCoach dataset reveals gaps in agentic computer use coaching

Researchers introduce DigitalCoach, a multimodal dataset comprising 72 human expert-novice computer use coaching sessions with 22,752 dialogue turns grounded in 28.1 hours of screen and input event recordings across five software applications.

Automated evaluation shows models provide more direct instructions but fewer explanations, error diagnoses, and knowledge-check questions compared to humans.
When coaching methods are fixed, model utterances resemble human references but remain poorly grounded in visual context.
Interactive evaluations confirm that model coaches cause learners to passively follow instructions without deeper engagement.

The dataset lays a foundation for developing collaborative and proactive computer use coaching agents.