Is it ever possible to have a malicious LLM with a backdoor

A Reddit user proposes the possibility of training Large Language Models to recognize a specific secret sentence that unlocks malicious behavior, raising concerns about security risks for both closed and open-source models.

The risk applies to all LLMs as long as the training data remains unknown.
Closed-source models are considered riskier because providers could intentionally alter behavior from the source code.
Local LLMs limit external backdoor injection but remain vulnerable to internal triggers, such as specific dates or times.
The author suggests detecting hidden behavior by injecting millions of requests and monitoring for idle neuron clusters that may activate under specific conditions.