Running Llama 3.1 405B on a Single 8xA100 Node with Hot-Loaded LoRA Adapters
A user demonstrates successfully running the Llama 3.1 405B model quantized to AWQ-INT4 on a single node equipped with eight A100 80GB GPUs, enabling up to 30 fine-tuned specialists to be loaded and switched in under 200ms.