A user is seeking recommendations for the best coding model to run on a dedicated hardware setup consisting of three Asus Ascent GX10 (GB10) units, expecting a concurrency of 5-10 users.

  • The proposed infrastructure utilizes vLLM combined with llama-swap.
  • Potential models under consideration include Qwen 3.5 122B, Qwen 3-coder, and Deepseek V4 Flash DSpark.
  • The user inquires about necessary context headroom scaling per user and whether three Spark units are optimal for this configuration.