Reasoning models — korshunov.ai

Reasoning models Page 21 / 35

DIPHINE: Neural Estimator for $Φ$-ID in Continuous Systems

DIPHINE is the first neural estimator that uses score-based diffusion models to jointly estimate all mutual information terms required by Integrated Information Decomposition ($Φ$ID) from a single amortized network. It recovers the sixteen non-overlapping information atoms via Möbius inversion and provides a theoretical analysis showing synergy-to-synergy estimation is the most challenging, with accurate results on synthetic benchmarks and real-world physiological data.

arxiv arXiv cs.LG · 8d ago

Spotlight: Using Spot GPUs to Accelerate DiT RL Post-Training

Spotlight enables DiT RL post-training by leveraging idle spot GPUs, reducing costs by 1.4-6.4x while achieving superior image quality. It uses stale model weights in exploration and reconfigures sequence parallelism on-the-fly, allowing efficient GPU utilization without breaking training pipelines.

arxiv arXiv cs.LG · 8d ago

LSTM-Vision Transformer Improves HRRR Forecast Error Prediction

A hybrid LSTM-Vision Transformer framework enhances prediction of HRRR forecast errors by integrating atmospheric profiles from mesonet profilers. It achieves up to twofold improvement in precipitation error prediction, especially during active planetary boundary layer periods, by better capturing convective error evolution and reducing PBL-related degradation.

arxiv arXiv cs.LG · 8d ago

RL Reward Types Enhance Resilience in Cyber-Physical Systems

A study evaluates model-free reinforcement learning controllers in nonlinear systems under cyberattacks. Lyapunov reward offers best resilience with low tracking error, while Proximal Policy Optimization outperforms Deep Deterministic Policy Gradient in reducing KPI variance.

arxiv arXiv cs.LG · 8d ago

Structure-First Architectures for Dynamical Learning

A new paradigm for dynamical systems learning prioritizes structural design over nonlinear expressivity. The proposed wave-inspired dynamical units use explicit, causal interactions to form layered architectures that emerge hierarchical behavior and informative internal representations, even with minimal parameter optimization.

arxiv arXiv cs.LG · 8d ago

Smoothness-Based Derandomization of PAC-Bayes Bounds

A new framework derandomizes PAC-Bayes bounds for smooth loss functions by analyzing the generalization gap of the Jensen gap class via Rademacher complexity. The resulting bounds for deterministic predictors involve flatness measures derived from Jacobians and Hessians of the score map, and are applied to linear models and smooth neural networks. A practical regularizer is proposed, computed using folded BatchNorm weights, and validated on CIFAR-10 with varying batch sizes.

arxiv arXiv cs.LG · 8d ago

JourneyFormer: Sequence Modeling for Airbnb Guest Journeys

JourneyFormer is a sequence modeling solution deployed at Airbnb to improve search ranking. It addresses production challenges like long, exploratory guest sequences and sparse booking labels through tailored design choices in data selection, embeddings, and label attribution. The model has shown improved offline metrics and significant business gains in online A/B tests across multiple production surfaces.

arxiv arXiv cs.LG · 8d ago

ViGOS: Decoupling Perception and Reasoning in Multimodal On-Policy Self-Distillation

ViGOS introduces a visually grounded on-policy self-distillation framework for multimodal large language models. It decouples perception and reasoning by using an image-only teacher for visual descriptions and a reasoning teacher for final outputs, reducing reliance on text-only references. This approach improves image-grounded performance across multiple vision-language benchmarks.

arxiv arXiv cs.LG · 8d ago

INDEQS: Graph-Informed Neural Controlled Differential Equations

INDEQS introduces a graph-based neural controlled differential equation framework that incorporates prior directed graph knowledge at architectural levels. It separates inner and outer mixing, offering both graph-constrained and data-adaptive variants, with outer informedness reducing mean absolute error on larger graphs, while inner informedness provides parameter efficiency for known adjacency adherence. Continuous decoders outperform discrete ones in real-world traffic and hydrological forecasting tasks.

arxiv arXiv cs.LG · 8d ago

ChronoSurv: A Graph Framework for Multimodal Survival Analysis

ChronoSurv introduces a hierarchical directed graph framework that models patient care as a progression-aware clinical trajectory. It achieves state-of-the-art performance in multimodal survival prediction by capturing structured clinical workflows and handling missing data through heterogeneous message passing.

arxiv arXiv cs.LG · 8d ago

OrthoReg: Orthogonal Regularization for Hybrid Symbolic-Neural Dynamical Systems

OrthoReg introduces orthogonal regularization to prevent neural components from relearning symbolic structures in hybrid dynamical systems. By directly penalizing overlap between symbolic and neural parts, it enables a complementary decomposition where symbolic models capture expressible physics and neural models handle remaining dynamics. On benchmarks with partial library mismatch, OrthoReg improves symbolic recovery and out-of-distribution performance.

arxiv arXiv cs.CL · 8d ago

Fair Cognitive Impairment Detection Through Unlearning

A multimodal framework combines speech, text, and image data with gradient reversal unlearning to reduce demographic bias in Mild Cognitive Impairment detection. The method outperforms existing multilingual and multimodal baselines on TAUKADIAL and PREPARE, with reduced performance gaps across sex and language subgroups, and shows improved transfer across datasets.

arxiv arXiv cs.CL · 8d ago

Speech-Driven End-to-End Language Discrimination for Chinese Dialects

A study evaluates speech-driven MFCC features and an HMM-DNN model with attention mechanisms for Chinese dialect discrimination. The approach combines word-level embeddings and MFCC features using a CNN, achieving superior performance on benchmark dialect corpora compared to existing methods.

arxiv arXiv cs.CL · 8d ago

Distance-Adaptive Representation for Attention

A new attention mechanism, Distance-Adaptive Representation (DAR), assigns richer representations to nearby tokens and reduced dimensions to distant ones. This approach matches full-dimensional performance across multiple model scales and fine-tuning, outperforming uniform dimensionality reduction.

arxiv arXiv cs.CL · 8d ago

CDDTLDA: Transfer Learning for Chinese Dialect Discrimination

A novel framework named CDDTLDA uses transfer learning and data augmentation to address Chinese dialects discrimination with limited annotations. It trains a source ASR model on a large dialect corpus, applies speed, pitch, and noise augmentation to low-resource target dialects, and fine-tunes a target ASR model using self-attention to capture shared semantic features. Experimental results show CDDTLDA outperforms state-of-the-art methods on two benchmark Chinese dialect corpora.

arxiv arXiv cs.CL · 8d ago

Steerable Cultural Preference Optimization of Reward Models

This paper introduces SCPO, a novel reward model training algorithm that balances diverse cultural preferences across subcommunities. SCPO improves minority reward model performance by up to 7 points on two datasets and seven countries, while being up to 280% more training data-efficient than full-data fine-tuning. Analysis shows reduced bias through targeted subcommunity preference evaluation.

arxiv arXiv cs.CL · 8d ago

PhysAssistBench Evaluates LLMs in Doctor-Patient-EHR Interaction

PhysAssistBench introduces a benchmark for interactive doctor-patient-EHR assistance using real MIMIC-IV cases. It features 1,296 manually reviewed, physician-validated turns and reveals that current LLMs struggle with coordinating clinical knowledge, communication, and EHR system interaction.

arxiv arXiv cs.CL · 8d ago

BCL: Bayesian In-Context Learning for Information Extraction

BCL is the first framework that uses particle filtering and Bayesian updates to systematically refine label representations in information extraction. It achieves consistent performance across model scales and generalizes to both sequence labeling and relation classification through four key steps: initialization, observation, weight update, and resampling.

arxiv arXiv cs.CL · 8d ago

PragReST: Self-Reinforcing Counterfactual Reasoning for Pragmatic Language Understanding

PragReST is a self-supervised framework that enhances large language models' pragmatic reasoning by generating counterfactual reasoning traces and training via supervised fine-tuning and reinforcement learning. It outperforms baseline models on four pragmatic benchmarks, improving Qwen3-8B and Qwen3-14B by 5.37% and 5-5.50% accuracy respectively, and maintains strong performance on general-knowledge and mathematical reasoning tasks.

arxiv arXiv cs.CL · 8d ago

PEC-Home: Simulated Dataset for Elliptical Command Interpretation

PEC-Home is the first simulated dataset designed to enable smart home assistants to interpret progressively elliptical commands. Experiments show that even with dialogue history tools, LLMs like GPT-4o fail to achieve accurate command execution from elliptical inputs, highlighting a significant gap in current assistant capabilities.