A Reddit user is collecting inference speed data for Nvidia’s 460GB nvfp4 checkpoint of GLM5.2 from the community.

  • The author reports running the model at approximately 1 token per second in a simulation harness, extrapolating to 75 tokens per second on a real CUDA MGPU machine.
  • Participants are asked to state their tokens per second first, followed by details on the inference engine and hardware specifications.
  • An example submission format includes memory configuration, CPU model, and disk I/O speeds.