Reasoning models — korshunov.ai

Reasoning models Page 1 / 35

Machine Learning Predicts Gestational Age from Fetal MRI

A machine learning pipeline using multi-modal fetal MRI data predicts gestational age at birth with an R2 of 0.13 and a mean absolute error of 2.74 weeks. It achieves 0.77 accuracy, 0.59 sensitivity, and 0.82 specificity, with cervical length and placental T2* statistics as key features. This work presents a proof of concept for predicting preterm birth using MRI and machine learning.

arxiv arXiv cs.LG · 7d ago

Computational Methods for Cell-Free DNA in Multi-Cancer Early Detection

This review outlines computational methods from 2022 to 2025 for detecting multiple cancers from blood-based cell-free DNA. It evaluates fragmentomics and epigenetic analysis, covering statistical, machine learning, and deep learning approaches, with a focus on biological interpretability, validation, and clinical readiness. Multimodal ensemble methods show the highest promise for clinical use, but standardized evaluation protocols are needed for reliable comparison and future progress.

arxiv arXiv cs.LG · 7d ago

Off-Policy Evaluation for MNAR Rewards in MDPs

We propose an off-policy evaluation method for finite-horizon MDPs with rewards missing not at random. Our approach uses a reward-dependent propensity model and a bridge function to recover conditional mean rewards without modeling the MNAR mechanism, achieving consistency and finite-sample error bounds. Experiments on simulated and MIMIC-III Sepsis data show superior performance over existing methods.

arxiv arXiv cs.LG · 7d ago

MAMO: Multi-Agent System for Multi-Objective Constrained Optimization

MAMO introduces a multi-agent reinforcement learning approach to address the challenge of balancing cost minimization and constraint satisfaction in dynamic environments. It decouples task execution from reward weight selection, treating the choice of weights as a learning problem to enable more autonomous and robust solutions.

arxiv arXiv cs.LG · 7d ago

Boundary Embedding Shaping for Graph Structural Disentanglement

Boundary Embedding Shaping (BES) addresses graph structural entanglement by selectively suppressing spurious neighbor correlations near class boundaries. BES uses adaptive contrastive learning to enhance boundary discrimination, improving GCN node classification by an average of 3.3% (up to 5.0% on WikiCS) and achieving superior link prediction accuracy.

arxiv arXiv cs.LG · 7d ago

Statistical Properties of Training and Generalization

The article examines deep learning's deviation from classical statistical intuitions, emphasizing neural scaling laws and their interaction with physical constraints and inductive biases in machine learning applications.

arxiv arXiv cs.LG · 7d ago

Model-Driven Approach for RL Environment Families

A model-driven approach generates families of reinforcement learning environments using a hybrid genetic algorithm. Environment variants are created through model transformations guided by a state-of-the-art model transformation engine, enabling scalable and error-resistant development. The method is validated in wildfire mitigation and curriculum learning scenarios.

arxiv arXiv cs.LG · 7d ago

Recurrent neural networks approximate continuous functions

A single ReLU recurrent neural network with fixed weights and hidden dimension can uniformly approximate any continuous function on [-1,1] as its runtime increases. This is achieved via a new model, the Turing machine with neural units (TMNU), which balances algorithmic flexibility with bounded simulation by RNNs. The convergence rates match polynomial approximation rates, and minimax lower bounds confirm that runtime is an essential, unavoidable resource.

arxiv arXiv cs.LG · 7d ago

Hybrid modeling predicts microbial dynamics in soil systems

A new hybrid modeling framework uses genomic data and neural networks to predict biokinetic parameters in soil organic matter turnover models. It incorporates ecological constraints to ensure realistic microbial dynamics, even for unobserved variables, and outperforms existing methods on both synthetic and real datasets with minimal training data.

arxiv arXiv cs.LG · 7d ago

Critical Percolation as a Synthetic Data Model for Interpretability

A new synthetic dataset based on critical mean-field percolation clusters provides a realistic, analytically tractable model with hierarchical structure. It features sparse, fractal clusters with power-law size distributions and latent variables that generate target values via a taxonomic hierarchy. Neural networks can linearly decode these ground-truth latent variables from activations, demonstrating strong interpretability.

arxiv arXiv cs.LG · 7d ago

Train, Retrieve, or Both? Head-to-Head on Statutory Citation for Ontario RTA

A four-arm comparison shows that retrieval is essential for accurate statutory citation under the Ontario Residential Tenancies Act. The SFT+RAG hybrid model achieves 0.481 exact-match with zero hallucinations, outperforming base and SFT-only models, and matches a pipeline using larger, specialized models without needing more data or larger training sets. Results are based on a small, human-verified real-world evaluation set and are preliminary.

arxiv arXiv cs.LG · 7d ago

De-biased VLM-as-3D-Judge Protocol for Furniture Generation

A de-biased VLM-based judge protocol specializes TRELLIS on furniture generation using lightweight adaptation. The protocol addresses failure modes like image overload and geometry-hiding, with calibration showing 0.83–1.0 win rates and base-vs-base symmetry at 0.5. Among six adaptation methods, conditioner repair under severe degradation achieves parity with the base model, while no method exceeds a 65% win-rate target.

arxiv arXiv cs.LG · 7d ago

CRAX: Fast Safe Reinforcement Learning Benchmarking

CRAX introduces a high-fidelity, fast safety benchmark for reinforcement learning using MuJoCo XLA. It achieves up to 100x speedups over CPU-based benchmarks via vectorization and hardware acceleration, featuring six environment suites and three agent-specific tasks across three difficulty levels. Evaluation of six safe RL methods shows no single approach dominates, highlighting trade-offs between performance and safety, with curriculum learning and safety transfer improving results.

arxiv arXiv cs.CL · 7d ago

CoT Transformers Can Efficiently Simulate Word RAM Algorithms

Chain-of-thought (CoT) transformers can efficiently simulate Word RAM algorithms with only poly-logarithmic overhead. This efficiency improves to log-square for flat instruction sets and logarithmic for multiplication-free ones, contrasting with prior Turing machine simulations that require quadratic overhead.

arxiv arXiv cs.CL · 7d ago

Sentiment Analysis Misses Key Customer Outcomes

A study of 70,450 support conversations found that sentiment analysis poorly captures customer satisfaction, with GPT-5.4-based satisfaction estimates correlating 0.47 with ratings versus sentiment's 0.36. The model also revealed 44% of conversations where tone and satisfaction diverge, exposing 'tolerated friction'—satisfied customers still reporting fixable issues—unseen by sentiment analysis.

arxiv arXiv cs.CL · 7d ago

TerraMARS: Small Language Model Pipeline for Mars Terraforming Literature

TerraMARS is an end-to-end pipeline that uses a domain-adapted small language model to extract structured information from Mars science literature. It converts unstructured text into JSON format and supports Mars terraforming-related question answering, enabling integration into habitability modeling and digital twin applications. The pipeline uses Google Gemma 3 1B fine-tuned with QLoRA on Mars-specific datasets, though further work is needed to improve accuracy and factual consistency.

arxiv arXiv cs.CL · 7d ago

NEST: Dataset for Narrative Event Structures in Long Videos

NEST introduces a dataset of 1005 full-length movies, each annotated with 102 multimodal narrative events grounded in visual, dialogue, and audio content. The dataset captures event relationships such as temporal ordering, hierarchy, and long-range dependencies, with benchmark tasks showing low performance in event detection and localization, and higher performance in event relation extraction after fine-tuning.

arxiv arXiv cs.CL · 7d ago

FineREX: Fine-Tuned NER-RE for Human Smuggling Knowledge Graphs

FineREX is a domain-specific knowledge graph pipeline that uses a fine-tuned LLM for named entity and relationship extraction. It outperforms general-purpose models by 15.50% in entity F1-score and 31.46% in relationship F1-score, reducing legal noise by nearly half and node duplication from 17.78% to 11.-17%. The system also cuts end-to-end processing time by 50.0% by eliminating redundant steps.

arxiv arXiv cs.CL · 7d ago

NRITYAM: Benchmark for Cultural Comprehension in Dance

NRITYAM is a multilingual benchmark with 9,260 question-answer pairs across 12 languages, designed to evaluate language models' cultural understanding of global dance traditions. Developed through collaboration with native dance artists and speakers, it offers a comprehensive assessment of AI's ability to grasp traditional performing arts in diverse socio-cultural contexts.

arxiv arXiv cs.CL · 7d ago

Sequential DPO Shows Variable Preference Impact Across Settings

A study of sequential Direct Preference Optimization finds that later training does not uniformly degrade earlier learned preferences. The effect varies by objective relationship, signal strength, and training order, ranging from partial degradation to positive transfer. Pair-level analysis reveals heterogeneous changes, with high-confidence preference pairs sometimes improving despite aggregate metric stability.