An analysis of a leaked 120KB system prompt for Anthropic's Claude Fable 5 model details the architectural strategies behind its alignment and tool orchestration. The document highlights how the model shares weights with the unrestricted Mythos 5 while relying on safety classifiers at inference.
- Dual-Deployment Safety: Fable 5 shares weights with Mythos 5 but uses safety classifiers during inference.
- Tool Schemas: The prompt contains 22 JSON tool schemas defining interfaces for external APIs and internal Anthropic services.
- Behavioral Refusal Softening: Instructions explicitly forbid using bullet points when refusing prompts to soften the emotional impact.
The analysis provides insights into frontier model alignment by examining how Anthropic handles safety and tool usage in its system architecture.