A Reddit user reports settling on an unsloth version of the Qwen 3.5 122b-a10b model (UD-IQ4_NL) for coding tasks with 64GB of VRAM.

  • The model features a 100k bf16 context window and runs at approximately 30 tokens per second.
  • Only a few layers are loaded into CPU/RAM to accommodate the hardware constraints.
  • The user also utilizes Qwen 3.6 models depending on specific needs but considers the 122b-a10b variant their daily driver.