Reasoning models — korshunov.ai — ML news

Reasoning models Page 1 / 35

arxiv arXiv cs.CL · 8d ago

Darshana Graph: A Corpus for Comparative Indian Philosophy

Darshana Graph presents a corpus of over 125,000 text records from Hindu, Buddhist, and Jain philosophical sources. It includes a unique subset of 8,500 aligned records from 18 commentators across five schools, enabling cross-commentator comparison. The corpus supports stylometric analysis and a large language model pipeline that extracts philosophical concept relationships, revealing disagreement patterns and extraction limitations.

arxiv arXiv cs.LG · 8d ago

ReLAR: Reinforcement-Guided Latent Refinement for Stable LLM Reasoning

ReLAR introduces a reinforcement-guided framework that iteratively refines hidden states to improve LLM reasoning stability. It uses learned depth and action controllers trained via policy gradients to adaptively determine refinement steps, achieving better accuracy and generation quality with lower inference overhead than explicit reasoning methods.

arxiv arXiv cs.LG · 8d ago

Domain-Validity-Gated Metamorphic Testing for SciML Surrogates

A domain-validity rubric screens candidate metamorphic relations by ensuring tolerance exceeds numerical floor and preconditions are met. The method transforms valid relations into executable, oracle-free test assets, validated across multiple CFD tasks and PDE families, distinguishing model violations from out-of-domain applications.

arxiv arXiv cs.LG · 8d ago

NMF with Topological Regularisation for Interpretable Bases

A new method integrates persistent homology into non-negative matrix factorisation to regularise the topology of basis functions. This approach enables spatially coherent image components, periodic time-series, and clique-like graph signals by using threshold-free topological scores as regularisers in the NMF objective.

arxiv arXiv cs.LG · 8d ago

Preference-Based Trajectory Evaluation for Agentic Systems

Offline evaluation of agentic systems often produces tied comparisons in 75% of cases using standard success-based metrics. Preference-based trajectory evaluation reduces ties to 35% by comparing progress and time-to-return profiles, enhancing discriminative power and data efficiency. These results suggest benchmark saturation may stem from evaluation method choice, not just data or problem difficulty.

arxiv arXiv cs.LG · 8d ago

CARLOS: Deep RL for Continuous-time Optimal Stopping

CARLOS uses an aggregate deep neural network to learn a joint space-time exercise boundary for optimal stopping problems. It progressively refines stopping decisions at finer time resolutions and employs adaptive sampling to focus training near the stopping boundary. Benchmarked results show CARLOS outperforms existing Bermudan solvers, approaching the American upper bound with high efficiency.

arxiv arXiv cs.LG · 8d ago

Reversal Q-Learning: A New Off-Policy RL Algorithm

Reversal Q-Learning (RQL) is a new off-policy reinforcement learning algorithm that trains a flow policy using prior data. By modeling flow refinement steps as actions in an expanded Markov decision process and applying virtual on-policy trajectories via reversal, RQL enables effective offline learning without backpropagation through time. Experiments on 50 robotic tasks show RQL achieves the best average performance among state-of-the-art flow-based offline RL methods.

arxiv arXiv cs.LG · 8d ago

ST-CND Framework for Early Warning of Geographic Tipping Points

SpatioTemporal Causal Network Diagnostics (ST-CND) introduces a data-driven framework to detect geographic tipping points by modeling spatial fields as time-evolving causal networks. It outperforms existing methods on sea-surface temperature benchmarks, achieving an AUROC of 0.783 and a critical-subnetwork IoU of 0.378 for the North Atlantic AMOC.

arxiv arXiv cs.LG · 8d ago

Credit-in-Event: Re-Anchoring Event Credit in Dynamics Models

A new method called Credit-in-Event identifies and addresses temporal credit dilution in learned dynamics models. CREST, a label-free and training-free readout, re-anchors pooled representations by estimating transient event cores and applying event-versus-rest contrast, reducing out-of-distribution error across multiple systems and data types. Ablations confirm the improvement stems from event-core credit re-anchoring, not generic locality or stability priors.

arxiv arXiv cs.LG · 8d ago

LLM Features Can Hurt GNNs via Concatenation Interference

Concatenating LLM-generated features to graph neural networks systematically reduces accuracy on homophilous benchmarks, with PubMed accuracy dropping by -17.0 +/- 0.3 pp. A measure of LLM-alone discriminability, Delta_sig, correlates strongly with concatenation performance (r^2 = 0.38), and a rule based on Delta_sig <= 13.8 pp correctly predicts non-positive impact in 7 out of 9 datasets.

arxiv arXiv cs.LG · 8d ago

SelFix: Root-Selecting Fixed-Point Inversion for Rectified Flows via Trajectory Straightness

SelFix improves fixed-point inversion by selecting solutions that produce straighter inverse trajectories, enhancing real-image reconstruction and source-preserving editing. Experiments on FLUX.1-dev and PIE-Bench show it outperforms prior baselines in both reconstruction quality and editing fidelity.

arxiv arXiv cs.LG · 8d ago

SkillMigrator: Transferable Interaction Patterns for Web Agent Efficiency

SkillMigrator learns reusable web skills by matching layout structures instead of element references. It stores each skill as a transferable interaction pattern with a structural sketch, enabling efficient skill transfer across sites. Compared to state-of-the-art methods, it reduces average LLM-action counts by 8-10% on WebArena and Mind2Web at matched success rates.

arxiv arXiv cs.LG · 8d ago

Risk Decomposition Framework for Pre-Hoc Fine-Tuning Prediction

A new framework decomposes pre-hoc fine-tuning prediction risk into intrinsic limits and optimization variance. It proves a necessary lower bound on variance decay and introduces a budget-optimal probing strategy, validated across synthetic and real-world benchmarks through three distinct prediction regimes.

arxiv arXiv cs.LG · 8d ago

Physics-Constrained Neural Networks Improve Weather Forecasting

A study enhances physics-constrained neural networks by introducing an upgraded numerical solver, a unified autoregressive block, and two neural backbones. These improvements reduce root mean squared error by 8-22% in short-term forecasts over the South Pacific and better preserve physical consistency.

arxiv arXiv cs.LG · 8d ago

TUNEAHEAD Predicts Fine-tuning Performance Before Training

TUNEAHEAD is a lightweight framework that predicts fine-tuning performance using meta-feature vectors from dataset descriptors and short probe runs. It outperforms baselines like Early-Stop Extrapolation and ProxyLM, achieving an RMSE of 1.47 percentage points and 95.1% of predictions within ±3 percentage points of true scores on 370 held-out runs.

arxiv arXiv cs.LG · 8d ago

Learnable Graph Patches for Feature Heterogeneity

We propose learnable graph patches as the smallest semantic units in graph data to address feature heterogeneity without textual information. Our framework uses patch encoders and aggregators to extract and combine knowledge across domains, enabling universal pre-training and improved downstream performance with more pre-training data.

arxiv arXiv cs.LG · 8d ago

EnvRL: Leveraging Environment Dynamics in Agentic RL

EnvRL introduces a framework that enhances agentic reinforcement learning by incorporating environment dynamics through state prediction and inverse dynamics objectives. When trained with GRPO, EnvRL improves success rates of Qwen-2.5-1.5B-Instruct from 72.8% to 77.4% on ALFWorld and from 56.8% to 67.0% on WebShop.

arxiv arXiv cs.LG · 8d ago

ASTEROID: Transformer for Multi-Step MD Forecasting

ASTEROID is a data-driven framework that predicts multi-step atomic coordinates in molecular dynamics simulations without iterative integration. It uses a spatiotemporal Transformer architecture to model multiscale dependencies, achieving higher accuracy and reduced computational cost compared to existing methods on quantum-mechanics derived datasets.

arxiv arXiv cs.LG · 8d ago

Fairness in Graph Neural Networks via Laplacian Adaptation

A new framework modifies the Laplacian operator in graph diffusion to enhance fairness by incorporating subspace projections, spectral adjustments, and frequency-based filtering. The method leverages graph diffusion's smoothing properties to mitigate bias, with theoretical analysis and empirical validation on synthetic and real-world datasets showing improved fairness without significant computational overhead.

arxiv arXiv cs.LG · 8d ago

Delta-Based Target Reformulation Improves Electricity Load Forecasting

A delta-based target reformulation enhances short-term electricity load forecasting by predicting load changes rather than absolute values. Results show over 50% MAPE reduction for hour-ahead forecasts across LSTM and Transformer models, with significant benefits for deep sequence models in day-ahead predictions.