A user shares a Docker configuration for running GLM-5.2-FP8 on HGX-H200 hardware using SGLang. The setup achieves 262k context length and 70 tokens per second with 8 tensor parallelism, using a memory fraction of 0.83. The user notes that vLLM official recipes do not work on H200 due to KV cache FP8 quantization limitations on the DSV3 architecture.
GLM-5.2-FP8 HGX-H200 SGLang Docker Deployment Config
from English