Reasoning models — korshunov.ai — ML news

Reasoning models Page 1 / 35

arxiv arXiv cs.CL · 3d ago

Validation-Gated Mechanistic Analysis of Suicidality Detection in LLMs

A validation-gated framework evaluates LLM internal features only after observed behavior, revealing a mid-network feature that causally contributes to suicide detection. This feature is semantic, low-rank, cross-model, and specific to suicidality over general distress, though steering is necessary but not sufficient. The pattern shows smaller models encode suicidality but only larger ones act on it, with evidence limited to English Reddit text.

arxiv arXiv cs.CL · 3d ago

Hierarchical Attention Transformers for Multi-Turn Jailbreak Detection

A new hierarchical attention model detects multi-turn jailbreaks by encoding turns into compact representations and using a lightweight conversation module to capture dialogue dynamics. On 14,038 conversations, it achieves an F1 score of 0.9394, outperforming Claude Opus 4.7 by 0.07 and reducing false-positive rate by half. Ablation studies show that combining cross-attention and self-attention in the conversation module lowers false positives by 2.26 percentage points.

arxiv arXiv cs.CL · 3d ago

LLM-Based Multi-Reference Evaluation for Phrase Break Annotations

LMRE addresses limitations of single-reference evaluation by modeling multiple valid phrasings of speech. It outperforms traditional methods in aligning with human judgment on acceptance and scoring, demonstrating scalability and robustness for Korean speech annotations.

arxiv arXiv cs.CL · 3d ago

Answer Engineering: Local Trajectory Editing for Protocol-Constrained Decision Making

Answer Engineering introduces a runtime layer that applies localized rule-based corrections to a model's reasoning trajectory during generation, without retraining. In a clinical benchmark for sudden sensorineural hearing loss, it increased protocol-compliant outcomes from 54.5% to 83.5% and conductive-case adherence from 1.6% to 58.9%.

arxiv arXiv cs.CL · 3d ago

Coherence Illusions in Dutch LLMs Revealed

Dutch language models exhibit coherence illusions similar to human readers. Surprisal and attention entropy metrics show that models are misled by context matches, with energy from associative memory identifying discourse coherence mechanisms.

arxiv arXiv cs.CL · 3d ago

Multi-Agent Audit Framework for Clinical Mental Health Screening

A multi-agent audit framework improves clinical mental health screening by decomposing reasoning into perception, retrieval, inference, and audit stages. Evaluated on the DAIC-WOZ dataset, it reduces PHQ-8 depression severity prediction error from 5.35 to 5.02 and offers interpretable, verifiable diagnostic rationales.

arxiv arXiv cs.CL · 3d ago

Study Finds AI Still Fails to Detect Legal Citation Hallucinations

A new study reveals over 1,000 legal filings contain fabricated citations, with the number rising annually. Benchmarking five AI models shows improved performance, with GPT-5 achieving 82.8% recall and 60.5% F1 in agentic settings, though all models struggle with subtle errors and face resource constraints due to limited information access.

arxiv arXiv cs.CL · 3d ago

Dementia-Agents: Multi-Modal Multi-Agent System for Dementia Staging

Dementia-Agents introduces a clinically aligned multi-agent framework for real-world dementia staging and phenotyping. It improves diagnostic performance over monolithic models and prior systems, while maintaining domain-level interpretability, using data from 1,066 patients across two cognitive neurology services.

arxiv arXiv cs.CL · 3d ago

Profile-Based Reference in LLM Grounding

The paper argues that reference in large language models is not a fixed link but a profile-based, context-sensitive, and numerically structured phenomenon. It proposes that LLMs ground reference through linguistic traces parameterized via optimization, with referential profiles distributed and activated via context-sensitive computations in vector spaces.

arxiv arXiv cs.CL · 3d ago

RoPE Does Not Prevent Retrieval Heads, Study Finds

A mechanistic analysis shows retrieval heads are causally necessary for long-context recall. Higher RoPE frequencies do not reduce head counts, and zeroing low-frequency RoPE dimensions in retrieval heads degrades recall dose-dependently, with effects observed across five models and multiple architectures.

arxiv arXiv cs.CL · 3d ago

SCOPE: Sequential Conformal Probing for OOD Rejection in LLMs

SCOPE introduces a framework that uses a readable hidden layer and conformal calibration to detect out-of-distribution inputs. It employs a supermartingale e-process to provide theoretical guarantees for service-boundary detection, outperforming standard final-layer detectors in multiple LLM backbones.

arxiv arXiv cs.CL · 3d ago

ARCO: Adaptive Rubric with Co-Evolution for Multi-Step LLM Agents

ARCO introduces a rubric framework that enables step-level credit assignment for multi-step LLM agents. It jointly updates a shared model with generation and scoring heads, allowing the rubric content and scoring function to co-evolve via on-policy data, improving performance and interpretability across benchmarks.

arxiv arXiv cs.CL · 3d ago

Factual Retrieval in LLMs Is Non-Contiguous and Redundant

Large language models use non-contiguous, redundant paths to retrieve factual attributes. These paths often skip layers and involve multiple equivalent routes, indicating distributed and redundant knowledge computation, challenging current understanding of LLM knowledge storage and retrieval.

arxiv arXiv cs.CL · 3d ago

Scientific Fine-Tuning Increases LLM Hallucinations

SciFactCheck evaluates 18 LLMs across five scientific domains, finding that scientifically fine-tuned models show degraded factual reliability and reduced internal confidence despite greater linguistic assertiveness. Human studies reveal limited agreement between fact-checking tools and expert judgments, highlighting challenges in defining valid scientific claims.

arxiv arXiv cs.CL · 3d ago

Precision-Recall Controllable Radiology Report Generation

A reinforcement learning framework enables precise control over clinical precision and recall in radiology report generation. By integrating a clinical reward and group-relative training, the model improves clinical efficacy beyond language fluency metrics, outperforming state-of-the-art methods on the MIMIC-CXR dataset.

arxiv arXiv cs.CL · 3d ago

Benchmark Evaluation of Small Language Models for Arabic NLP

A benchmark of 240 Arabic test items across eight domains and ten skills assesses twelve small language models in zero-shot settings. Gemma 3 (12B) achieved the highest overall score (4.548/5), followed by Aya and C4AI Command Arabic, with performance linked more to Arabic alignment and instruction-following than model size. Common failure modes include prompt leakage, hallucination, and weak task adherence.

arxiv arXiv cs.CL · 3d ago

Two-Stage Alignment Improves Math Tutoring Pedagogy

A two-stage alignment pipeline enhances large language models' pedagogical performance in math mistake remediation. The approach combines supervised fine-tuning with Direct Preference Optimization using synthetic data on scaffolding and factuality, outperforming base and existing tutoring models in both accuracy and teaching quality. Human evaluations show the model competes with a proprietary baseline, offering greater openness and reproducibility.

arxiv arXiv cs.CL · 3d ago

MedHal-Loc Benchmark Tests Localization Faithfulness in Medical Hallucination Detectors

MedHal-Loc introduces a benchmark to evaluate whether medical hallucination detectors accurately localize errors. It finds that while some architectures localize well above chance, a knowledge-graph pipeline performs no better than random due to poor entity extraction, despite strong detection performance. The results show that detection capability does not guarantee faithful localization, challenging assumptions about architectural explainability.

arxiv arXiv cs.CL · 3d ago

Ablation Study of Agentic RAG Components with Local 7B Model

A controlled ablation study evaluates agentic RAG components using a local 7B model on HotpotQA. Fixed hybrid retrieval outperforms adaptive routing by 1.8 EM and 1.9 F1, while two retrieval iterations capture 95% of the gains from five. Query decomposition and cross-encoder reranking show statistically significant but smaller improvements.

arxiv arXiv cs.CL · 3d ago

Case-Specific Dynamic Rubric Framework for Translation Evaluation

The paper proposes a dynamic rubric framework that adapts MQM evaluation spaces to individual translation instances. By selecting subtype spaces and granularities based on case-specific needs, it improves error coverage and localization, outperforming static rubric methods on WMT span-level benchmarks.