Microsoft Research introduces SkillOpt, a method that treats agent skill files as trainable parameters outside a frozen target model, transforming manual skill editing into a controlled optimization process. This approach improves agent reliability and consistency without updating the underlying model weights.

  • SkillOpt organizes skill editing as a forward-backward-update cycle where a separate optimizer model refines skills based on trajectory feedback.
  • The system uses bounded text edits, validation gating, and rejected-edit buffers to prevent uncontrolled prompt drift.
  • Evaluated across six benchmarks, seven target models, and three execution modes, SkillOpt achieved the best or tied-best results in all 52 evaluation cells.
  • With GPT-5.5 in direct chat mode, SkillOpt increased the average benchmark score from 58.8 to 82.3, a +23.5-point absolute improvement.
  • Optimized skills demonstrate transferability across model scales, agent harnesses, and related tasks, capturing reusable workflow knowledge.

By reframing skill writing as a training process with step-size control and validation, SkillOpt addresses the obstacle of uncontrolled skill evolution, enabling more dependable deployment of AI agents in production environments.