Apostate introduces a new contrastive co-vector edit operator defined as E = I − R Dᵀ. This method removes refusal behavior by isolating harmful variance while preserving harmless behavior through a predictor W trained on harmless activations and suppressed on harmful prompts. On granite-3.3-8b, it reduces refusal rate from 96.0% to 5.0% with only a 0.081-nat increase in harmless KL divergence.
New ablation operator: contrastive co-vector edit
from English