Lemonade v10.8 introduces dynamic VRAM management that auto-unloads idle models and downsizes KV-cache to reclaim GPU memory. It adds cloud offload support for OpenAI-compatible providers, enabling local-first model serving with optional cloud routing. A new MCP gateway exposes local models as tools via POST /mcp, allowing local models to be used as tools in MCP-aware applications.
Lemonade v10.8 Releases Auto Memory Management, Cloud Offload, and MCP Tool Support
from English