All articles
arxiv arXiv cs.AI · 6h ago

CrossPool: Efficient Multi-LLM Serving for Cold MoE Models through KV-Cache and Weight Disaggregation

CrossPool is a serving engine designed for cold Mixture-of-Experts (MoE) models that disaggregates FFN weights and KV-cache into separate GPU memory pools to address memory inefficiencies in sparse request scenarios. By consolidating static weights and dynamically provisioning active KV-cache demand, the system aims to improve GPU memory utilization and support bursty long-context requests.

arxiv arXiv cs.AI · 7h ago

Reinforcement Learning for Computer-Use Agents with Autonomous Evaluation

The authors propose a reinforcement learning fine-tuning framework that utilizes autonomous vision-language evaluation as a scalable supervision signal for GUI agents, eliminating the need for manual labels or task-specific heuristics. By treating evaluator feedback as a noisy binary reward channel and deriving a noise-corrected estimator for Proximal Policy Optimization, the method addresses the difficulty of obtaining machine-readable rewards in open-ended desktop environments.