Reasoning models — korshunov.ai

Reasoning models Page 1 / 35

EnvRL: Leveraging Environment Dynamics in Agentic RL

EnvRL introduces a framework that enhances agentic reinforcement learning by incorporating environment dynamics through state prediction and inverse dynamics objectives. When trained with GRPO, EnvRL improves success rates of Qwen-2.5-1.5B-Instruct from 72.8% to 77.4% on ALFWorld and from 56.8% to 67.0% on WebShop.

arxiv arXiv cs.LG · 8d ago

ASTEROID: Transformer for Multi-Step MD Forecasting

ASTEROID is a data-driven framework that predicts multi-step atomic coordinates in molecular dynamics simulations without iterative integration. It uses a spatiotemporal Transformer architecture to model multiscale dependencies, achieving higher accuracy and reduced computational cost compared to existing methods on quantum-mechanics derived datasets.

arxiv arXiv cs.LG · 8d ago

Fairness in Graph Neural Networks via Laplacian Adaptation

A new framework modifies the Laplacian operator in graph diffusion to enhance fairness by incorporating subspace projections, spectral adjustments, and frequency-based filtering. The method leverages graph diffusion's smoothing properties to mitigate bias, with theoretical analysis and empirical validation on synthetic and real-world datasets showing improved fairness without significant computational overhead.

arxiv arXiv cs.LG · 8d ago

Delta-Based Target Reformulation Improves Electricity Load Forecasting

A delta-based target reformulation enhances short-term electricity load forecasting by predicting load changes rather than absolute values. Results show over 50% MAPE reduction for hour-ahead forecasts across LSTM and Transformer models, with significant benefits for deep sequence models in day-ahead predictions.

arxiv arXiv cs.LG · 8d ago

Vision-language models don't always need images for chest X-ray accuracy

A causal audit shows that many vision-language models achieve high chest radiograph accuracy without using images. Text-only models match multimodal models in performance and outperform them in grounding, with accuracy and confidence flags only appearing when image use occurs. These findings suggest that accuracy alone is insufficient to validate clinical deployment, and grounding must be assessed.

arxiv arXiv cs.LG · 8d ago

Blind Recovery of Latent Domains via Unsupervised Symmetry Discovery

The paper proposes an unsupervised framework to recover latent domains and signals from corrupted observations by discovering data symmetries. It models observations as linear measurements of signals from a latent random field and uses a shallow group-convolutional network with stationarity and locality constraints to learn latent symmetry actions and filters, enabling recovery from unstructured data.

arxiv arXiv cs.LG · 8d ago

Lightweight Experiential Latent Memories for Continual Self-Improvement

A new method enables large language models to learn from their own reasoning traces without external supervision. By distilling inference-time computation into lightweight, modular latent memories, the model achieves performance competitive with full training and outperforms zero-shot and raw ICL baselines on mathematical reasoning tasks, with minimal computational overhead.

arxiv arXiv cs.LG · 8d ago

Conservation Laws for Modern Neural Architectures

This paper introduces a unified framework to identify conservation laws in gradient flow for modern neural architectures. It covers feedforward networks with GELU, SiLU, and SwiGLU activations, multihead attention with sinusoidal and rotary positional encodings, and Mixture-of-Experts models under various gating schemes. Experiments validate the predicted invariants, supporting the theoretical findings.

arxiv arXiv cs.LG · 8d ago

Functional Equivalence in Attention with Positional Encodings

A comprehensive study reveals that sinusoidal positional encodings preserve functional equivalence in Transformers, while rotary positional encodings reduce symmetry, enhancing expressivity. The research shows that positional encodings critically influence linear mode connectivity, with empirical results demonstrating variability in connectivity depending on the encoding used.

arxiv arXiv cs.LG · 8d ago

LLM Belief Stabilization via Prompted Predictive Resampling

Large language models exhibit early belief drift in multiple-choice question answering, violating the martingale property. Prompted predictive resampling (PPR) reveals this drift, which self-stabilizes after sufficient resampling, leading to coherent predictive distributions. We propose a seed-answer prompting strategy and a self-consistency loss to accelerate stabilization and reduce drift, improving predictive coherence without affecting accuracy.

arxiv arXiv cs.LG · 8d ago

Qwen-RobotManip Achieves Generalization in Robotic Manipulation

Qwen-RobotManip, a Vision-Language-Action foundation model, enables large-scale training through unified alignment across representation, motion, and behavior. It uses open-source data to build a 38,100-hour pretraining corpus and demonstrates emergent generalization, outperforming prior state-of-the-art models in out-of-distribution settings and ranking first in RoboChallenge with a 20% relative improvement on real-robot platforms.

arxiv arXiv cs.LG · 8d ago

WallZero Beats Go Pros in WallGo

WallZero, an AlphaZero-based agent, defeats two professional Go players in WallGo, averaging 1.98x more territory per game. The study finds that the opening from the Netflix series creates a more balanced game, suggesting improved fairness in play.

arxiv arXiv cs.LG · 8d ago

Order-Independent Cell-Level Representations for Multi-Task Table Recognition

This paper introduces a structural refinement module using non-causal attention to generate order-independent cell features in autoregressive multi-task table recognition. The approach enables parallel cell content inference while maintaining global context, improving cell localization and end-to-end recognition with a threefold reduction in inference time.

arxiv arXiv cs.LG · 8d ago

MKAN: Monotonic Kolmogorov-Arnold Networks with Hard Monotonicity

MKAN introduces a Kolmogorov-Arnold Network with hard monotonicity guaranteed for all parameter values, achieved through exponential reparameterization, positive edge weights, and a monotone base activation. It enables standard gradient descent training and provides a representation-cost theorem showing that any feature extractor can be realized with monotone structure at a size no more than twice the original, offering a principled scaling rule for monotone encoders.

arxiv arXiv cs.LG · 8d ago

Dimensionality Controls When Modularity Helps in Continual Learning

Modular architecture enhances compositional continual learning only in low-dimensional regimes where representational subspaces partially align for similar tasks. In high-dimensional regimes, both modular and single networks perform similarly, indicating modularity's benefit depends on representational dimensionality induced by initialization scale.

arxiv arXiv cs.LG · 8d ago

Hybrid Ret-DNN with XGBoost for Customer Behavior Forecasting

A study proposes a hybrid Ret-DNN with XGBoost model to forecast customer behavior in e-commerce. Using 500,000 transaction records from a UK retailer, the model achieves a Mean Absolute Error of 0.2193, outperforming the existing Ret-DNN model.

arxiv arXiv cs.LG · 8d ago

SoftMoE: Soft Differentiable Routing for Mixture-of-Experts in LLMs

SoftMoE replaces discrete top-k routing with a differentiable soft top-k LapSum relaxation, enabling gradient-based optimization of expert selection. It learns to allocate expert activation non-uniformly across layers, with later layers activating more experts, while using significantly fewer experts than traditional sparse MoE.

arxiv arXiv cs.LG · 8d ago

CERS: CoT-Enhanced Reasoning for Medical Image Segmentation

CERS introduces Chain-of-Thought reasoning to improve semi-supervised medical image segmentation by integrating linguistic descriptions from large language models. It uses a semantic-aware reference selection and multi-scale coordinate attention to resolve boundary ambiguities and semantic inconsistencies, outperforming state-of-the-art methods in clinical scenarios with visual-semantic mismatch.

arxiv arXiv cs.LG · 8d ago

Half-Link Sufficiency in Knowledge Graph Foundation Models

A new study shows that KGFMs can predict whole links using only partial observations, such as half-links. It identifies four scenarios based on observed half-links and reveals that state-of-the-art models leverage seen half-links, while unseen ones present significant generalization challenges. This taxonomy offers a diagnostic framework for evaluating and improving KGFM robustness.

arxiv arXiv cs.AI · 8d ago

STAR: SpatioTemporal Adaptive Reward Allocation for Text-to-Image RL Post-Training

STAR introduces a spatio-temporal reward allocation method for text-to-image generation, using attention maps to dynamically assign advantages across denoising steps. It improves semantic alignment, text rendering, and preference optimization in Stable Diffusion 3.5 Medium, achieving 0.9759, 0.9757, and 23.60 on GenEval, OCR, and PickScore respectively.