TAPO advances self-distillation by constructing explicit micro-reflective trajectories that retain erroneous reasoning and insert natural-language diagnoses. These trajectories, derived from correct and incorrect model rollouts, provide fine-grained error corrections anchored in the model's own reasoning, improving both first-pass reasoning and error correction compared to GRPO.