media r/LocalLLaMA · 1h ago · src: 5d ago · open_models

Reddit user selects Qwen 3.5 122b-a10b for 64GB VRAM coding

from English

A Reddit user reports settling on an unsloth version of the Qwen 3.5 122b-a10b model (UD-IQ4_NL) for coding tasks with 64GB of VRAM.

The model features a 100k bf16 context window and runs at approximately 30 tokens per second.
Only a few layers are loaded into CPU/RAM to accommodate the hardware constraints.
The user also utilizes Qwen 3.6 models depending on specific needs but considers the 122b-a10b variant their daily driver.

Importance 1/3 r/LocalLLaMA Code generation Inference efficiency