Overrefusal from Small On-Premises LLMs in Criminal Legal Context
A study investigates the impact of overrefusal on small, on-device large language models when processing legal prompts, finding that authority-style prefixes systematically increase refusal rates by 2 to 20 times compared to a no-prefix baseline. While role-play jailbreak prefixes showed mixed effects across different models, the results indicate that these small LLMs are unstable under contextual framings typical of real institutional users.