Reasoning models — korshunov.ai

Reasoning models Page 1 / 35

Style Diversity Outperforms Topic Diversity in Annotation-Free Synthetic Data

A new framework generates synthetic dialogue without human-annotated data, using only intent definitions. It incorporates topic and style attributes, with post-hoc stylization models Univ and Exam, and an LLM-as-a-judge filtering process. Results show up to 93.3% of human-annotated data performance, confirming that style diversity is more critical than topic diversity for data utility.

arxiv arXiv cs.LG · 7d ago

Direct Advantage Estimation for Partially Observable Domains

Direct Advantage Estimation (DAE) is extended to partially observable domains with minimal modifications. A discrete latent dynamics model reduces computational overhead by efficiently approximating transition probabilities, enabling scalable and sample-efficient deep reinforcement learning in high-dimensional observation spaces.

arxiv arXiv cs.LG · 7d ago

DeepGaLA: Neural Surrogates with Uncertainty for PDE Inverse Problems

DeepGaLA is a neural-network surrogate that provides uncertainty-aware predictions for inverse problems in partial differential equations. It achieves accuracy comparable to Gaussian-process surrogates while maintaining efficiency in high-dimensional parameter spaces and incorporating differential-equation constraints.

arxiv arXiv cs.LG · 7d ago

Mechanistic Study of Representation Retention in Continual Learning

A synthetic framework reveals that superposition increases over time with transient dips at task boundaries, indicating boundary-specific interference. Higher feature sparsity promotes superposition without inevitable forgetting, provided representation strength is maintained. Task-level effective rank grows with sparsity, showing broader capacity usage under sparse conditions.

arxiv arXiv cs.LG · 7d ago

HEPTv2: End-to-End Efficient Point Transformer for Charged Particle Reconstruction

HEPTv2 achieves 98.6% tracking efficiency with 0.8% fake rate on TrackML, using only 15 ms inference time and 0.4 GB memory per event. It outperforms prior transformer and graph-based methods in efficiency and reduces latency by factors of 7 and 38–52, respectively, enabling real-time particle reconstruction at the HL-LHC.

arxiv arXiv cs.LG · 7d ago

Two-Stage Evolutionary Hyperparameter Optimization for PINNs

A two-stage evolutionary strategy improves Physics-Informed Neural Network performance by first screening hyperparameter candidates via low-fidelity training, then refining top candidates with gradient-based optimization. The approach reduces mean error significantly across Advection, Klein-Gordon, and Helmholtz equation problems under fixed computational budgets.

arxiv arXiv cs.LG · 7d ago

Topological Data Analysis for Real-Time Process Monitoring

A new method combines topological data analysis and machine learning to monitor high-dimensional dynamic processes. It represents time-series data as manifolds, uses topological descriptors to capture structure, and employs neural ordinary differential equations to model dynamic evolution. The approach effectively detects diverse events in industrial process data and outperforms reconstruction-based and trajectory-based alternatives.

arxiv arXiv cs.LG · 7d ago

SSH-Net: Deep Network for Failure Time Prediction under Competing Risks

SSH-Net is a structured deep neural network designed to predict failure time distribution functions under competing risks. It uses separate sub-networks for different covariate groups, improving accuracy by aligning neural structure with data hierarchy. The model is validated through simulation studies and applied to Titan GPU failure data.

arxiv arXiv cs.LG · 7d ago

Agentic Symbolic Search for PDE Solution Characterization

ASYS proposes a prior-guided framework that uses mathematical theory and evolutionary search to generate interpretable symbolic forms of PDE solutions. It produces analytical representations for complex problems like Allen-Cahn dynamics and Keller-Segel blow-up, offering new pathways for mathematical analysis beyond traditional methods.

arxiv arXiv cs.LG · 7d ago

Riemannian Sharpness Explains SGD's Bias Toward Flat Minima

This study introduces Riemannian sharpness, a reparametrization-invariant measure of flatness grounded in Fisher Information Matrix geometry. It proves SGD's stationary distribution concentrates at Riemannian-flat minima and links this geometric bias to generalization via a PAC-Bayes bound. Experiments on MNIST and CIFAR-10 show Riemannian sharpness better tracks generalization than Euclidean sharpness, with scaling consistent with theory.

arxiv arXiv cs.LG · 7d ago

RefRad2D Dataset Enables Scalable Spatial Grounding in Radiology

RefRad2D is a large-scale bilingual dataset of 1.2M CT and MR image-text pairs from clinical practice. Trained on this data, RadGrounder achieves competitive results in VQA and report generation while maintaining language quality through spatial grounding supervision without performance degradation.

arxiv arXiv cs.LG · 7d ago

How Safety-Aligned LLMs Interpret Mixed Compliance Demonstrations

A study finds benign and harmful compliance demonstrations are not interchangeable in language models. Benign demonstrations can either reduce or increase harmful compliance depending on the model, with preference optimization playing a key role in preventing harmful compliance. The research also reveals recency bias in demonstration ordering and varied model behaviors in handling refusals during in-context learning.

arxiv arXiv cs.LG · 7d ago

Probe-and-Refine Tuning Improves Coding Agent Performance

A new method called probe-and-refine tuning uses synthetic bug-fix probes to iteratively improve repository guidance files with single-shot LLM calls, without agent loops or tool use. On SWE-bench Verified, it achieves a 33.0% mean resolve rate—14.5 percentage points higher than the initial static knowledge base—showing improved coverage rather than patch precision. The method enables agents to use larger step budgets effectively, and performance remains stable across models when diagnostic output is sufficient.

arxiv arXiv cs.LG · 7d ago

Multi-Task Bayesian In-Context Learning Framework

A new multi-task in-context learning framework enables amortized hierarchical Bayesian inference by representing prior information as a prefix in datasets. The transformer model adapts predictions across prior families, matching oracle performance on diverse tasks while being significantly faster. It is validated on real-world spatiotemporal temperature prediction.

arxiv arXiv cs.LG · 7d ago

Calibration in MoE Models Under Distribution Shift

This paper examines how mixture-of-experts models maintain calibration under distribution shift. It finds that expert-level calibration ensures overall model calibration in hard-routed models but is insufficient for soft-routed models. The authors propose adversarial reweighting to penalize calibration errors in routed aggregates, improving the accuracy-calibration tradeoff across tasks and shifts.

arxiv arXiv cs.LG · 7d ago

Lie-Algebra Attention: Group Element Tokens in Neural Networks

Lie-Algebra Attention introduces attention tokens as matrix Lie group elements, using the closed-form algebra norm of relative poses as attention scores. This method achieves invariant, equivariant attention without representation-theoretic components, outperforming vector-token baselines on SE(2), SO(3), and Aff(2) with fewer parameters and no learned kernels.

arxiv arXiv cs.LG · 7d ago

UNIEGO: Proxy-Mediated Unified Egocentric Video Representation

UNIEGO introduces a hierarchical multi-teacher distillation framework that uses proxy models to mediate knowledge transfer from nine diverse teachers across viewpoints and modalities. The Selective Proxy Distillation (SPD) stage adaptively selects reliable proxies during training, improving representation quality and stability. UNIEGO achieves state-of-the-art results in action recognition, video retrieval, and action segmentation on ego-exo benchmarks.

arxiv arXiv cs.LG · 7d ago

How Transparent is DiffusionGemma?

DiffusionGemma has poor variable transparency due to high opaque serial depth, but this can be mitigated by an interpretable token bottleneck, reducing serial depth to 1.1X that of Gemma 4. Algorithmic transparency is more challenging in diffusion models due to dynamic token changes, though case studies reveal novel phenomena like non-chronological reasoning and intermediate-context reasoning. DiffusionGemma is found to be similarly monitorable to Gemma 4.

arxiv arXiv cs.CL · 7d ago

RefRad2D Dataset Enables Scalable Spatial Grounding in Radiology

RefRad2D is a large-scale bilingual dataset of 1.2M CT and MR image-text pairs from clinical practice. Trained on this data, RadGrounder achieves competitive VQA results and performs spatial grounding without degrading language quality, enabling verifiable outputs in radiology.

arxiv arXiv cs.CL · 7d ago

H-RePlan: Hierarchical Recovery for Cross-Device Agent Systems

H-RePlan introduces a hierarchical replanning framework that separates device-local strategy recovery from global orchestrator replanning. It outperforms existing baselines by achieving higher completion and instruction adherence, with reduced token cost, through scope-aware recovery in multi-device agent systems.