media r/LocalLLaMA · 1h ago · open_models

Best coding model for 3x Spark setup?

from English

A user is seeking recommendations for the best coding model to run on a dedicated hardware setup consisting of three Asus Ascent GX10 (GB10) units, expecting a concurrency of 5-10 users.

The proposed infrastructure utilizes vLLM combined with llama-swap.
Potential models under consideration include Qwen 3.5 122B, Qwen 3-coder, and Deepseek V4 Flash DSpark.
The user inquires about necessary context headroom scaling per user and whether three Spark units are optimal for this configuration.

Importance 0/3 r/LocalLLaMA Code generation

Read original