A user on r/LocalLLaMA asks how to reduce the approximately 10-second processing time required for a 7.1k token system prompt in every new session when using Ornith 35b with llama.cpp.

  • The user is running Ornith 35b with llama.cpp on a Strix Halo (WIN10) setup.
  • The current configuration processes the entire 7k token system prompt for each new session, causing significant latency.
  • The provided command line includes flags such as `--cache-ram 8192`, `--cache-reuse 256`, and `--kv-unified`.

The user seeks a solution to cache the static system prompt to improve response times for their PI agent.