A user reports running Qwen3.6 27B MTP with llama.cpp on an RTX PRO 6000 Blackwell workstation to reduce reliance on Claude, noting the model is comparable to Sonnet but suffers from stability issues during coding sessions.
- The setup uses Windows 11, VS Code Copilot extension, and 4 parallel agents with full context (1M tokens).
- VRAM usage is approximately 83 GB out of 97 GB, with the model compiled using specific CUDA flags for Blackwell architecture.
- Stability problems include random agent stops due to malformed responses and occasional llama.cpp crashes mid-session.
- The MTP version provides a 15–20% speed increase with quality on par with the non-MTP variant.
The author seeks advice from the community on improving setup stability and properly exploiting the hardware for local coding agents.