Reasoning models
arxiv arXiv cs.CL · 2d ago

Transformer Models: Architectures, Applications, and Critical Assessment

This review presents a taxonomy of transformer-based language models across domain verticals, covering encoder-only, decoder-only, encoder-decoder, long-context, permutation-based, and generator-discriminator variants. It evaluates post-2023 advancements like instruction tuning and mixture-of-experts scaling, and assesses model deployments in healthcare, finance, legal, education, customer service, creative writing, and scientific work, linking each to specific capabilities. The paper critically analyzes model architectures on four key deployment axes, quantifies parameter count versus energy cost, and examines how alignment methods, data provenance, and benchmark saturation define 'state of the art'.

arxiv arXiv cs.CL · 2d ago

Age of LLM: Benchmark for LLM Reasoning and Diplomacy

Age of LLM introduces a turn-based 1v1 benchmark where two LLMs compete on a 13x7 grid under fog of war, full diplomacy, and strict JSON reliability rules. Findings show the nuclear rush dominates, diplomacy is prolific but rarely succeeds, and illegal actions reveal belief-tracking errors, with a weak link between reliability and victory. The corpus is small and unbalanced, and the results offer a preliminary view of LLM reasoning under adversarial uncertainty.

arxiv arXiv cs.CL · 2d ago

Cross-Lingual Proverb Studies Reveal Cultural Meaning Preservation in LLMs

A study evaluates how large language models preserve cultural meaning when generating narratives from equivalent proverbs across 15 languages. Results show semantic consistency in moral lessons, with systematic shifts in narrative agency and structure, and strong convergence across model families. The research highlights that current evaluations may overestimate cultural preservation by focusing only on semantic similarity.

arxiv arXiv cs.LG · 2d ago

Memory-Efficient Graph Filtering for Scalable Collaborative Filtering

Mem-GF introduces a memory-efficient graph filtering method that approximates polynomial graph filters using Krylov subspaces, eliminating the need to store the full item similarity graph. It achieves up to 5.74× lower memory usage and 4.38× faster runtime while maintaining superior recommendation accuracy compared to state-of-the-art methods, scaling effectively to datasets with tens of millions of interactions.

arxiv arXiv cs.LG · 2d ago

Distilling Transformers into Recurrent Transformers for Efficient Memory

A new distillation method transfers the observation compression strategy of full-history transformers to recurrent models. By training a teacher model to compress observation histories into fixed-size bottlenecks, the approach aligns the student's memory with the teacher's compression. This enables recurrent transformers to achieve near-full-history performance with linear-time complexity, making them viable for long-horizon robotics applications.

arxiv arXiv cs.LG · 2d ago

LIG: Layer-wise Integrated Gradients for Transformer Flow Analysis

LIG extends Integrated Gradients to set-to-set maps in Transformers, enabling token-level attribution within layers. It analyzes module-wise and layer-wide attribution consistency and tracks information flow via separate attention and MLP contributions, using target token embedding and zero or zero-attention outputs as baselines. LIG operates at module boundaries without retraining or custom interpreters, offering a diagnostic XAI tool for Transformer internals.