Reasoning models — korshunov.ai

Reasoning models Page 23 / 35

Decoupling Search from Reasoning in LLM Agents

Decoupled Search Grounding (DSG) separates search functionality from reasoning models, enabling vendor-agnostic, tunable, and reusable search grounding. DSG achieves near-native accuracy on SimpleQA with 91% lower search cost and 99.4% warm-cache hit rate, while reducing latency by 68% and preserving concise output contracts.

arxiv arXiv cs.CL · 8d ago

GraphPO: Graph-based Policy Optimization for Reasoning Models

GraphPO introduces a directed acyclic graph framework to represent reasoning rollouts, merging semantically equivalent paths to reduce redundant exploration. It assigns efficiency and correctness advantages to edges, improving inference efficiency and process supervision while reducing advantage-estimation variance. Experiments show GraphPO outperforms chain- and tree-based methods on three LLMs across reasoning and agentic search tasks under identical token or response budgets.

arxiv arXiv cs.CL · 8d ago

Speech-Based Dementia Assessment with Error Mitigation

This study improves accuracy in dementia screening by using speech-derived features from the German Syndrom-Kurz-Test. Models combine transcript scores and Whisper embeddings to reduce scoring errors and approximate expert ratings by compensating for missing motor subtests. The approach achieves strong correlation with expert ratings and effectively distinguishes cognitive status groups.

arxiv arXiv cs.CL · 8d ago

CADE: Direct Timestep Embedding for Time-Series Question Answering

CADE introduces direct timestep embedding and contrastive alignment to preserve time-series structure in LLMs. By mapping each timestep directly into the LLM embedding space, it avoids tokenization bottlenecks and outperforms existing baselines on six TSQA tasks.

arxiv arXiv cs.CL · 8d ago

G-IdiomAlign: Gloss-Pivoted Benchmark for Cross-Lingual Idiom Alignment

G-IdiomAlign introduces a gloss-pivoted benchmark using English glosses from Wiktionary to anchor idioms. It includes controlled multiple-choice equivalence and gloss-contrastive generation protocols, showing that glosses improve performance in semantic alignment, though results remain modest, indicating significant potential for improvement in cross-lingual idiom generation.

arxiv arXiv cs.CL · 8d ago

Steerable Model Merging for Multilingual Reasoning

Steerable Model Merging (ST-Merge) introduces a gated cross-attention mechanism to adaptively weight source models during multilingual reasoning. It outperforms existing baselines on four multilingual reasoning benchmarks across 21 languages by dynamically prioritizing models based on input characteristics.

arxiv arXiv cs.CL · 8d ago

Sumi: Open Uniform Diffusion Language Model from Scratch

Sumi is a 7B-parameter uniform diffusion language model pretrained from scratch on 1.5T tokens. It competes with autoregressive models on knowledge, reasoning, and coding tasks but underperforms on commonsense benchmarks, likely due to its education-heavy data mixture. The model weights, checkpoints, and full training recipe are publicly released.

arxiv arXiv cs.CL · 8d ago

Leadership as Coordination Control in Multi-Agent LLM Teams

Process-level coordination control adds value only when the initial majority consensus is unreliable, the task is recoverable, and unguided interaction fails to repair errors. Across multiple models and tasks, no leadership style outperforms others in accuracy, aligning with contingency theory rather than suggesting a failure of the approach.

arxiv arXiv cs.CL · 8d ago

Index Sickness Elimination via Baseline-Log Physical Separation

In a 391-session AI collaboration project, LLMs exhibited 'Index Sickness'—a failure where symbolic complexity leads to self-referential outputs disconnected from reality. The 'Pang Principle' asserts natural language conveys superior semantic quality over symbolic systems, and the 'Baseline-Log Physical Separation' mechanism reduced AI instruction volume by 75% and eliminated recurrence of Index Sickness in subsequent sessions.

arxiv arXiv cs.CL · 8d ago

Urdu Katib Handwritten Dataset Released for UHTR Research

The Urdu Katib Handwritten Dataset (UKHD) is a new benchmark dataset of offline Urdu handwritten text lines, curated from historical Katib writings in Nastalique calligraphy. It evaluates CRNN-based models, with the CNN-BGRU-CTC architecture showing the lowest error rates, making it a strong baseline for Urdu handwritten text recognition.

arxiv arXiv cs.CL · 8d ago

Human-AI Coevolution Framework Reveals Social Intelligence Emergence

The Human-AI Coevolution Dynamics Framework (HACD-H) introduces a unified model for long-term human-AI interaction, integrating emotional adaptation, memory, and personality into a self-organizing social cognitive system. Results show social intelligence emerges through coevolution, with a significant negative correlation between social intelligence and social cognitive energy (r = -0.391, p < 0.001), and progressive energy reduction over time in interaction trajectories.

arxiv arXiv cs.AI · 8d ago

PID Feedback Control for Interpretable Activation Steering in Music Generation

This paper proposes a Dual Steering framework using Gram-Schmidt Orthogonalization to decouple Pitch and Duration control in symbolic music generation. By isolating latent directions via DiffMean and applying PID feedback, it enables deterministic, independent modulation of signal attributes without retraining, reducing conceptual interference and signal degradation.

arxiv arXiv cs.AI · 8d ago

SHIFT: Reducing Language Bias in Multilingual Information Retrieval

SHIFT is a training-free method that mitigates language bias in multilingual information retrieval by using parallel translation pairs to estimate relative language vectors. It corrects language-specific offsets in document embeddings during indexing, improving retrieval performance across diverse models and benchmarks.

arxiv arXiv cs.AI · 8d ago

ProfiLLM: Utility-Aligned Agentic User Profiling for Industrial Ride-Hailing Dispatch

ProfiLLM introduces an agentic LLM pipeline that extracts behavioral signals from ride-hailing logs to generate user profiles. It achieves up to +6.14% relative AUC improvement and up to +4.35% GMV gain in dispatching simulations, with consistent online A/B test results showing +0.47% GMV, +0.33% Completion Rate, and -0.82% Cancel-Before-Accept rate improvements.

arxiv arXiv cs.AI · 8d ago

Self-Conditioned Credit Assignment for RL with Verifiable Rewards

SC-GRPO uses per-token KL divergence from self-conditioned trajectories to weight gradients in reinforcement learning. It outperforms GRPO by 8.1% and DAPO by 5.9% across math, code, and agentic tasks, with superior out-of-distribution performance and better results than OPD.

arxiv arXiv cs.AI · 8d ago

Rescaling MLM-Head for Neural Sparse Retrieval

A study finds that large MLM-head norms in pretrained encoders degrade sparse retrieval performance in SPLADE. Introducing a simple initialization-time rescaling of the MLM-head stabilizes training and improves performance, matching or exceeding BERT-SPLADE in multiple benchmarks.

arxiv arXiv cs.AI · 8d ago

Reinforcement Learning Foundation Models Should Already Be A Thing

Reinforcement learning lacks foundation models despite synthetic MDPs being feasible. A proof-of-concept shows a single model trained on synthetic MDPs solves tabular benchmarks without tuning, outperforming existing methods in online settings and matching them offline.

arxiv arXiv cs.AI · 8d ago

Maturing Markov Decision Processes Introduce New Decision Framework

Maturing Markov Decision Processes (MMDPs) model the asymmetric evolution of information and action availability in sequential decisions. They introduce an expiring-action priority principle and a structure-aware reinforcement learning framework that improves learning efficiency, especially in complex and scalable decision problems.

arxiv arXiv cs.AI · 8d ago

Space Is Intelligence: Neural Semigroup Superposition for Riemannian Metric Generation

Intelligence is embedded in the space itself, where scenes induce a Riemannian metric on configuration manifolds. A single Encoder-Router network uses semigroup-superposition to generate this metric, enabling zero-shot generalization across unseen obstacle configurations with large cost differences between collision-free and obstacle-penetrating paths.

arxiv arXiv cs.AI · 8d ago

Data Recipe Boosts Long-Context Reasoning in LLMs

A data-centric approach improves long-context reasoning in large language models, using eight curated datasets with 14K examples across retrieval, multi-evidence synthesis, and reasoning tasks. When paired with minimal outcome-based GRPO training, it achieves average gains of +7.2 to +6.4 points on seven benchmarks, outperforming prior RL training sets, and enhances agentic performance by +4.8 and +7.0 points on GAIA and BrowseComp respectively.