The user asks if a 467GB GLM 5.2 model can be run on four servers, each with 512GB RAM and 409.6 GB/s memory bandwidth, using CPU-only inference with Unsloth. They consider splitting the model across nodes for token speed or using 8-bit versions in dual clusters to handle larger models and improve performance.
Can GLM5.2 be run on 4x AMD EPYC servers with 512GB RAM each?
from English