All articles — korshunov.ai

All articles Page 1 / 90

Fingerprinting agent behavior through procedural trajectories

We introduce a method to identify agents by their procedural behavior fingerprints, achieving 85.7% accuracy in attributing unseen trajectories to correct agents. Using ProcGrep, we analyze coding agent behavior in SWE-Bench, finding that models from similar release periods or distilled from each other exhibit closer behavioral similarity, with a Jensen-Shannon divergence of 0.25.

arxiv arXiv cs.LG · 9d ago

Analytic Torsion and Spectral Gap Capture Persistent-Laplacian Performance

A compact spectral representation using Betti numbers, spectral gap, and analytic torsion distills persistent Laplacians into three mathematically grounded invariants. This approach captures essential predictive signals from the full spectrum, outperforms it in some cases, and reduces computational overhead on datasets like MNIST, QM-3D, and SKEMPI WT.

arxiv arXiv cs.LG · 9d ago

Multi-Center Benchmark for Abdominal Disease Diagnosis from Non-Contrast CT

A new multi-center benchmark enables abdominal disease diagnosis and report generation from non-contrast CT by synthesizing contrast-enhanced findings. The dataset includes paired NCCT-CECT studies and reports from two centers, showing NCCT achieves average multi-organ AUCs of 69.1% internally and 63.1% externally. The benchmark and code are publicly released to support research into safer, contrast-free abdominal imaging workflows.

arxiv arXiv cs.LG · 9d ago

PACT: Small Language Model Deliberation for Reactive Reinforcement Learning

PACT combines a reactive RL policy with a 2B-parameter Small Language Model to generate and validate action plans. The SLM plan is executed directly if verified in simulation, bypassing the RL policy without retraining. PACT outperforms baselines on three increasingly difficult FrozenLake environments.

arxiv arXiv cs.LG · 9d ago

ActiveSAM: Fast and Accurate Open-Vocabulary Segmentation

ActiveSAM is a training-free, zero-shot framework that enhances SAM 3 for open-vocabulary semantic segmentation by identifying an image-conditioned active class set. It improves speed-accuracy tradeoff, outperforming SegEarth-OV3 by +1.4 mIoU on average and running up to 5.5x faster on large-vocabulary datasets, with strong robustness to image corruption.

arxiv arXiv cs.LG · 9d ago

Post-Hoc Falsification Operators Fail to Improve Accuracy in Small Code Models

A measurement study finds that 26 semantic post-hoc operators do not improve held-out accuracy over Best-of-N in frozen small code models. While some operators reduce compute usage or recover correct programs, none outperform BoN in accuracy, due to systemic limitations like coverage walls and consensus traps. An expression-layer recovery (M1) improves performance on HumanEval+ by 12 tasks, with no harm or leakage, and shows consistent results across model cells.

arxiv arXiv cs.LG · 9d ago

PPAD-hardness for min-max optimization of quadratic polynomials

Computing approximate stationary points of min-max optimization over the hypercube is PPAD-hard for quadratic polynomials. This result holds even for multilinear polynomials where each variable appears in at most three monomials, with inverse polynomial approximation factors. As a consequence, two-team zero-sum polymatrix games are proven to be PPAD-hard.

arxiv arXiv cs.LG · 9d ago

TuneJury: Open Metric for Music Generation Preference Alignment

TuneJury is an open, instance-level pairwise reward model that predicts music preference scores from text prompts and audio clips. It is trained on diverse human-preference data and demonstrates strong generalization, with anchor calibration enabling efficient post-hoc alignment for music generation systems.

arxiv arXiv cs.LG · 9d ago

Neural EXposure Interaction Search for Interpretable HTE

NEXIS identifies causal heterogeneous treatment effects by discovering Markov-blankets in pre-treatment data. It leverages multi-modal, multi-view measurements and scalable representations with minimal human input, enabling interpretable and actionable policy insights from controlled experiments.

arxiv arXiv cs.LG · 9d ago

ROVE: Reinforcement Learning with Human Interventions for Humanoid Manipulation

ROVE enables humanoid Vision-Language-Action models to learn effective manipulation behaviors using imperfect human interventions. It combines a human-in-the-loop data collection pipeline with Optimistic Value Estimation and cross-embodiment supervision to prioritize high-value actions and improve robustness. ROVE outperforms baseline methods on real-world, contact-rich manipulation tasks through iterative rollout and intervention cycles.

arxiv arXiv cs.LG · 9d ago

Residual Connections Mitigate Gradient Issues in Deep Networks

A study uses multiplicative ergodic theory to analyze exploding and vanishing gradients in deep neural networks. It shows that residual connections affect the Liapunov spectrum, as characterized by Furstenberg and Kifer, thereby stabilizing gradient flow during training.

arxiv arXiv cs.LG · 9d ago

Filtered Conformal Ellipsoids for Graph-Native Time Series

A new method called filtered conformal ellipsoids provides prediction sets for multivariate time series by using a frozen state-space filter to generate predictive means and covariances, then applying split-conformal calibration to Mahalanobis scores. The approach achieves coverage under dependence through contraction in an observable predictive-law quotient, with theoretical bounds derived under Gaussian-projection and observability conditions, and shows sharper ellipsoids on graph-native traffic benchmarks compared to static and non-filter baselines.

arxiv arXiv cs.LG · 9d ago

TokenPilot: Cache-Efficient Context Management for LLM Agents

TokenPilot reduces inference costs by 61% to 87% in both isolated and continuous modes, outperforming prior systems in cost efficiency while maintaining competitive performance. It uses ingestion-aware compaction and lifecycle-aware eviction to stabilize prompt prefixes and manage context segments efficiently.

arxiv arXiv cs.LG · 9d ago

A Mathematical Review of Shape Space Analysis in Machine Learning

This survey presents a mathematical framework for analyzing geometric data, integrating differential geometry, statistics, and machine learning. It outlines a unified pipeline for shape representation, geodesic metrics, statistical analysis, and geometry-aware learning, enabling the study of shape variability and structural trajectories across populations and time. Applications span biology, medicine, anthropology, and computer vision, highlighting challenges in handling nonlinear and unaligned geometric variation.

arxiv arXiv cs.LG · 9d ago

ExpRL: Exploratory RL for LLM Mid-Training

ExpRL introduces a novel mid-training approach for LLMs using human-written question-answer data as reward scaffolds. Instead of imitating reference solutions, it constructs problem-specific grading rubrics to reward intermediate reasoning steps, enabling better initialization for sparse-reward RL and outperforming SFT, sparse-reward GRPO, and self-distillation on math reasoning tasks.

arxiv arXiv cs.LG · 9d ago

HAMON: Passive Optical Forecasting Core

HAMON uses passive optical diffraction to generate forecasts, outperforming digital baselines on ETTm2 at all horizons and ETTh2 at all but the longest horizon. It achieves up to 14% lower MSE and operates without trainable digital mixing, relying instead on physical optical propagation.

arxiv arXiv cs.LG · 9d ago

KVEraser: Efficient Localized Context Erasing in LLMs

KVEraser enables efficient localized context erasing in large language models by replacing only the KV cache states of an erased span with learned steering states. It achieves near-full-recomputation performance on in-domain tasks and offers a 24% latency increase versus a 17.6x increase for full recomputation, with up to 3--4x speedup on long-document QA tasks.

arxiv arXiv cs.LG · 9d ago

DP-FL Backdoor Attacks: RING Exploits Privacy for Malicious Signals

A new attack, RING, exploits differential privacy in federated learning to conceal backdoor signals while maximizing impact. It achieves 90.3% attack success against state-of-the-art defenses, up to 26.08x over baseline methods, and reveals a critical security gap in DP-FL due to inherent masking of malicious updates.

arxiv arXiv cs.LG · 9d ago

Phase in Neural Representations: An Internal Oppenheim-Lim Test

Image classifiers like PRISM2D, GFNet, and ViT-B/16 show that phase, not magnitude, drives predictions in hidden layers. ResNet-50 reveals a latent sign code in late blocks, indicating phase/sign identity exists across architectures, though expressed differently due to activation and readout mechanisms.

arxiv arXiv cs.LG · 9d ago

HABC Improves RL Fine-Tuning of VLAs with Sparse Outcomes

Hierarchical Advantage-Weighted Behavior Cloning (HABC) enhances online RL fine-tuning of vision-language agents by using separate critic heads for viability and efficiency. It combines their outputs via a state-adaptive gate and applies per-transition weights, while intervention-aware credit assignment prevents supervision leakage. In real-robot experiments, HABC boosts success rates to 92%, 88%, and 38% on three bimanual tasks, surpassing SFT baselines of 36%, 44%, and 12%.