Reasoning models
arxiv arXiv cs.LG · 8d ago

LLM Belief Stabilization via Prompted Predictive Resampling

Large language models exhibit early belief drift in multiple-choice question answering, violating the martingale property. Prompted predictive resampling (PPR) reveals this drift, which self-stabilizes after sufficient resampling, leading to coherent predictive distributions. We propose a seed-answer prompting strategy and a self-consistency loss to accelerate stabilization and reduce drift, improving predictive coherence without affecting accuracy.

arxiv arXiv cs.LG · 8d ago

Qwen-RobotManip Achieves Generalization in Robotic Manipulation

Qwen-RobotManip, a Vision-Language-Action foundation model, enables large-scale training through unified alignment across representation, motion, and behavior. It uses open-source data to build a 38,100-hour pretraining corpus and demonstrates emergent generalization, outperforming prior state-of-the-art models in out-of-distribution settings and ranking first in RoboChallenge with a 20% relative improvement on real-robot platforms.

arxiv arXiv cs.LG · 8d ago

MKAN: Monotonic Kolmogorov-Arnold Networks with Hard Monotonicity

MKAN introduces a Kolmogorov-Arnold Network with hard monotonicity guaranteed for all parameter values, achieved through exponential reparameterization, positive edge weights, and a monotone base activation. It enables standard gradient descent training and provides a representation-cost theorem showing that any feature extractor can be realized with monotone structure at a size no more than twice the original, offering a principled scaling rule for monotone encoders.

arxiv arXiv cs.AI · 8d ago

Semantics-First Latent Modeling for 3D MRI Reconstruction

A new framework prioritizes anatomical semantics during 3D MRI latent compression, addressing long-range coherence and clinical detail loss. It introduces a Latent Harmonization Encoder and Semantic Recovery Block to preserve meaningful structures, and an Anatomy-aware Frequency Loss to maintain high-frequency diagnostic features. Experiments on public MRI datasets show improved reconstruction and cross-contrast synthesis quality.

arxiv arXiv cs.AI · 8d ago

LegalHalluLens: Auditing Hallucinations in Legal AI

LegalHalluLens introduces a framework to audit AI hallucinations in legal contexts by analyzing typed hallucination profiles across four claim categories. It reveals a 38-40 point gap between obligation/numeric and temporal claims, and shows two systems with identical 52% hallucination rates can have opposite risk directions. The framework uses a Risk Direction Index and calibrated debate pipelines to reduce fabricated detections by 45% and improve accountability in legal AI deployment.

arxiv arXiv cs.AI · 8d ago

ProvenanceGuard: Source-Aware Factuality Verification for MCP-Based LLM Agents

ProvenanceGuard introduces a source-aware verifier for MCP-based LLM agents that detects cross-source conflation by routing claims to specific evidence sources and comparing stated attribution with actual source ownership. It achieves block F1 of 0.802 and source accuracy of 0.858 on 260 source-eligible claims, outperforming source-blind baselines, and detects all injected attribution swaps in 50 clinical probes.