Reasoning models
arxiv arXiv cs.LG · 1d ago

Memory-Efficient Graph Filtering for Scalable Collaborative Filtering

Mem-GF introduces a memory-efficient graph filtering method that approximates polynomial graph filters using Krylov subspaces, eliminating the need to store the full item similarity graph. It achieves up to 5.74× lower memory usage and 4.38× faster runtime while maintaining superior recommendation accuracy compared to state-of-the-art methods, scaling effectively to datasets with tens of millions of interactions.

arxiv arXiv cs.LG · 1d ago

Distilling Transformers into Recurrent Transformers for Efficient Memory

A new distillation method transfers the observation compression strategy of full-history transformers to recurrent models. By training a teacher model to compress observation histories into fixed-size bottlenecks, the approach aligns the student's memory with the teacher's compression. This enables recurrent transformers to achieve near-full-history performance with linear-time complexity, making them viable for long-horizon robotics applications.

arxiv arXiv cs.LG · 1d ago

LIG: Layer-wise Integrated Gradients for Transformer Flow Analysis

LIG extends Integrated Gradients to set-to-set maps in Transformers, enabling token-level attribution within layers. It analyzes module-wise and layer-wide attribution consistency and tracks information flow via separate attention and MLP contributions, using target token embedding and zero or zero-attention outputs as baselines. LIG operates at module boundaries without retraining or custom interpreters, offering a diagnostic XAI tool for Transformer internals.

arxiv arXiv cs.AI · 1d ago

TASER: Task-Differentiated Skill Expansion for Heterogeneous Continual Learning

TASER introduces a framework that dynamically expands and routes atomic skills for continual learning across highly heterogeneous tasks. It reduces catastrophic forgetting and improves plasticity by ensuring semantic distinctness and efficient capacity allocation through skill detection and routing mechanisms. Evaluated on HeteroCLBench, a benchmark of 19 diverse tasks across 9 cognitive dimensions, TASER outperforms existing baselines.

arxiv arXiv cs.AI · 1d ago

Benchmark Evaluation of Small Language Models for Arabic NLP

A benchmark of 240 Arabic test items across eight domains and ten skills assesses twelve small language models in zero-shot settings. Gemma 3 (12B) achieved the highest overall score (4.548/5), followed by Aya and C4AI Command Arabic, with performance linked more to Arabic alignment and instruction-following than model size. Common failure modes include prompt leakage, hallucination, and weak task adherence.