Reddit user seeks advice on multi-model backends and config swapping

A Reddit user is planning to deploy a machine with multiple GPUs for serving coding and Hermes models, seeking solutions that allow flexible configuration swapping without manual intervention.

The user aims to switch between running two smaller models for less-intensive tasks, one large model across multiple GPUs, or a larger coding-focused model based on current needs.
They have evaluated llamaswap, LiteLLM, llamactl, and GPUStack but found issues with flexibility, enterprise focus, or tuning requirements.
The hardware setup includes up to four 3090s on a Threadripper 3945WX with ~128GB of DDR4 RAM.

The user is asking the community for recommendations on tools that minimize manual intervention and allow self-contained orchestration by Hermes.