The authors introduce SkillCoach, a framework that derives skill-grounded process rubrics from real rollouts to address the difficulty of reliable skill-use in repositories with overlapping skills. It evaluates agent trajectories along four dimensions: skill selection, skill following, skill composition, and skill-grounded reflection.

  • The system keeps the external verifier as a separate outcome signal, allowing process quality to be distinguished from accidental task success.
  • Evolved rubrics serve as process supervision for selecting high-quality training trajectories.
  • Experiments show that evolved rubrics substantially improve evaluation quality and expose failures hidden by final accuracy.
  • The framework provides stronger supervision signals than outcome-only filtering for enhancing agentic skill-use.

SkillCoach allows process quality to be distinguished from accidental task success, providing stronger supervision signals for enhancing agentic skill-use.