A user shares a bash configuration script for running the Qwen3.6-35B-A3B IQ4_XS model using the Vulkan backend in llama.cpp on an AMD 7900 XTX GPU with Ubuntu.

  • The setup utilizes the `llama-server` binary with Vulkan support and specific environment variables like `GGML_VK_VISIBLE_DEVICES`.
  • Key parameters include a context size of 262,144 tokens, 99 GPU layers, flash attention enabled, and continuous batching.
  • The configuration reports approximately 22k MiB memory usage and claims token generation speeds twice as fast as optimized ROCm 7.14 with lower memory footprint.