GLM5.2 performance

A Reddit user is collecting inference speed data for Nvidia’s 460GB nvfp4 checkpoint of GLM5.2 from the community.

The author reports running the model at approximately 1 token per second in a simulation harness, extrapolating to 75 tokens per second on a real CUDA MGPU machine.
Participants are asked to state their tokens per second first, followed by details on the inference engine and hardware specifications.
An example submission format includes memory configuration, CPU model, and disk I/O speeds.