media r/LocalLLaMA · 7d ago · open_models

GLM-5.2-FP8 HGX-H200 SGLang Docker Deployment Config

from English

A user shares a Docker configuration for running GLM-5.2-FP8 on HGX-H200 hardware using SGLang. The setup achieves 262k context length and 70 tokens per second with 8 tensor parallelism, using a memory fraction of 0.83. The user notes that vLLM official recipes do not work on H200 due to KV cache FP8 quantization limitations on the DSV3 architecture.

Importance 2/3 r/LocalLLaMA Zhipu AI Code generation Evaluation & benchmarks Inference efficiency

Benchmarks

Benchmark	Model	Score
LMSYS Arena (Elo)	GLM-5.2-FP8	—

Read original