Researchers identify 'role confusion' as a key vulnerability in LLMs, where models misinterpret user input due to stylistic similarities with internal role tags. Destyling user prompts reduces attack success from 61% to 10%, showing that subtle text style changes can dramatically alter model behavior, even when the content appears identical to humans.
Prompt Injection as Role Confusion
from English