Reasoning models — korshunov.ai

Topic · Reasoning models

ROVE enables humanoid Vision-Language-Action models to learn effective manipulation behaviors using imperfect human interventions. It combines a human-in-the-loop data collection pipeline with Optimistic Value Estimation and cross-embodiment supervision to prioritize high-value actions and improve robustness. ROVE outperforms baseline methods on real-world, contact-rich manipulation tasks through iterative rollout and intervention cycles.

arxiv arXiv cs.LG · 9d ago

ExpRL: Exploratory RL for LLM Mid-Training

ExpRL introduces a novel mid-training approach for LLMs using human-written question-answer data as reward scaffolds. Instead of imitating reference solutions, it constructs problem-specific grading rubrics to reward intermediate reasoning steps, enabling better initialization for sparse-reward RL and outperforming SFT, sparse-reward GRPO, and self-distillation on math reasoning tasks.

arxiv arXiv cs.LG · 9d ago

HABC Improves RL Fine-Tuning of VLAs with Sparse Outcomes

Hierarchical Advantage-Weighted Behavior Cloning (HABC) enhances online RL fine-tuning of vision-language agents by using separate critic heads for viability and efficiency. It combines their outputs via a state-adaptive gate and applies per-transition weights, while intervention-aware credit assignment prevents supervision leakage. In real-robot experiments, HABC boosts success rates to 92%, 88%, and 38% on three bimanual tasks, surpassing SFT baselines of 36%, 44%, and 12%.

arxiv arXiv cs.LG · 9d ago

Geometric Action Model for Robot Policy Learning

The Geometric Action Model (GAM) enables robot policies to reason about 3D physical interactions by repurposing a pretrained geometric foundation model. GAM splits the GFM to serve as both an observation encoder and a causal future predictor, then routes predicted future geometry and actions through the same backbone, achieving accurate, robust, and efficient manipulation performance in simulation and real-robot benchmarks.

media r/LocalLLaMA · 10d ago

HalBench Tests 29 Open Source Models on Sycophancy and Hallucination

HalBench evaluates 29 open-source LLMs on a custom benchmark for sycophancy and hallucination. Qwen 3.6 and Gemma 4 outperform larger models, with Qwen 3.6 achieving 36.6% pushback—higher than GPT-5.4 and Gemini 3.1 Pro. Model size does not correlate with honest responses, indicating that architecture and training data matter more than parameters.

arxiv arXiv cs.LG · 9d ago

Phase in Neural Representations: An Internal Oppenheim-Lim Test

Image classifiers like PRISM2D, GFNet, and ViT-B/16 show that phase, not magnitude, drives predictions in hidden layers. ResNet-50 reveals a latent sign code in late blocks, indicating phase/sign identity exists across architectures, though expressed differently due to activation and readout mechanisms.

arxiv arXiv cs.LG · 9d ago

Exact Posterior Score Estimation for Linear Inverse Problems

The paper derives the exact posterior score in closed form for linear Gaussian inverse problems, enabling efficient posterior sampling via denoising. It introduces Exact Posterior Score (EPS), a training objective that preserves pretraining structure and achieves superior performance on fidelity, perceptual, and distributional metrics with fewer denoiser evaluations than gradient-based methods.

arxiv arXiv cs.LG · 9d ago

Filtered Conformal Ellipsoids for Graph-Native Time Series

A new method called filtered conformal ellipsoids provides prediction sets for multivariate time series by using a frozen state-space filter to generate predictive means and covariances, then applying split-conformal calibration to Mahalanobis scores. The approach achieves coverage under dependence through contraction in an observable predictive-law quotient, with theoretical bounds derived under Gaussian-projection and observability conditions, and shows sharper ellipsoids on graph-native traffic benchmarks compared to static and non-filter baselines.

arxiv arXiv cs.LG · 9d ago

A Mathematical Review of Shape Space Analysis in Machine Learning

This survey presents a mathematical framework for analyzing geometric data, integrating differential geometry, statistics, and machine learning. It outlines a unified pipeline for shape representation, geodesic metrics, statistical analysis, and geometry-aware learning, enabling the study of shape variability and structural trajectories across populations and time. Applications span biology, medicine, anthropology, and computer vision, highlighting challenges in handling nonlinear and unaligned geometric variation.

ROVE: Reinforcement Learning with Human Interventions for Humanoid Manipulation

ExpRL: Exploratory RL for LLM Mid-Training

HABC Improves RL Fine-Tuning of VLAs with Sparse Outcomes

Geometric Action Model for Robot Policy Learning

HalBench Tests 29 Open Source Models on Sycophancy and Hallucination

Phase in Neural Representations: An Internal Oppenheim-Lim Test

Exact Posterior Score Estimation for Linear Inverse Problems

Filtered Conformal Ellipsoids for Graph-Native Time Series

A Mathematical Review of Shape Space Analysis in Machine Learning