User considers buying 4x Ascend GX10s for GLM5.2 inference

A Reddit user is considering purchasing four Ascend GX10 GPUs to prepare for running a future open-source "fable 5" model, citing performance benchmarks from other users who tested GLM5.2 on similar hardware.

Benchmarks show GLM5.2 achieves 400-500 tokens per second for prompt processing and approximately 15 tokens per second for output at a 128k context length on four DGX Sparks or Ascend GX10s.
The setup draws around 1000W of power, which the user notes is manageable.
Quantization is suggested as a method to improve usability given the current inference speeds.