A Reddit user is collecting inference speed data for Nvidia’s 460GB nvfp4 checkpoint of GLM5.2 from the community.
- The author reports running the model at approximately 1 token per second in a simulation harness, extrapolating to 75 tokens per second on a real CUDA MGPU machine.
- Participants are asked to state their tokens per second first, followed by details on the inference engine and hardware specifications.
- An example submission format includes memory configuration, CPU model, and disk I/O speeds.