All articles
arxiv arXiv cs.AI · 4h ago

The African Language Tax: Quantifying the Cost, Latency, and Context Penalty of Tokenizing African Languages in Frontier LLMs

A study quantifies the structural tokenization penalty faced by African languages in commercial large language models, revealing that speakers pay higher costs and experience greater latency due to inefficient subword token assignment. Across 20 African languages and 11 frontier tokenizers, every tested language incurs a premium over English, with median costs reaching 1.88 times that of English and up to 8.92 times for N'Ko script.

arxiv arXiv cs.AI · 5h ago

G$^3$VLA: Geometric inductive bias for Vision-Language-Action Models

The authors propose G$^3$VLA, a camera-aware geometric module that injects calibrated structure into the visual-token stream of pretrained Vision-Language-Action models without altering their action space or imitation objective. This approach combines intrinsic-conditioned ray embeddings, projective positional encoding, and bidirectional cross-view fusion to address the mismatch between 2D image coordinates and robot camera geometry.

arxiv arXiv cs.AI · 5h ago

CrossPool: Efficient Multi-LLM Serving for Cold MoE Models through KV-Cache and Weight Disaggregation

CrossPool is a serving engine designed for cold Mixture-of-Experts (MoE) models that disaggregates FFN weights and KV-cache into separate GPU memory pools to address memory inefficiencies in sparse request scenarios. By consolidating static weights and dynamically provisioning active KV-cache demand, the system aims to improve GPU memory utilization and support bursty long-context requests.