LeanGuard: A Fast and Light Approach for Robust Moderation
This paper investigates whether safety guardrails actually require chain-of-thought reasoning by training a lightweight bidirectional encoder alongside a reasoning-based guard on the same corpus. The authors find that removing reasoning does not improve moderation accuracy, challenging the common belief that step-by-step thinking is necessary for effective moderation.