DriftGuard: Safety-Aware Multi-Monitor Detection and Selective Adaptation for Evolving Toxicity Moderation
This paper introduces DriftGuard, a framework that combines multi-monitor drift detection with selective model updating to address evolving toxicity in automated moderation systems. The system tracks specific safety-relevant shifts, such as identity-harm and toxic-risk drift, rather than relying solely on global distributional changes.