Research paper — korshunov.ai

Research paper Page 1 / 19

Select-to-Act: Hierarchical Reinforcement Learning via Adaptive Language Guidance

The paper introduces HRLLI, a hierarchical reinforcement learning framework designed to improve sample efficiency by leveraging natural-language instructions. It addresses the limitation of existing approaches that treat instructions as static inputs, failing to account for their stage-dependent relevance in complex environments. The proposed method decomposes instructions into piecewise guidance elements that become relevant at different interaction stages. A novel Select-to-Act paradigm is formulated where a high-level semantic policy acts as a selector for the most relevant instruction piece based on the current state. This selected guidance conditions a low-level policy that executes environment actions, with both policies learned simultaneously to maximize augmented expected returns. Experiments on the RTFM benchmark demonstrate that HRLLI consistently outperforms strong instruction-conditioned RL baselines. The results confirm that explicitly modeling adaptive instruction selection significantly enhances reinforcement learning effectiveness.

arxiv arXiv cs.LG · 17h ago

SAFER: Reliability-Guided Adaptive Ensembling for Robust Test-Time Adaptation

The authors address the brittleness of test-time adaptation (TTA) under adversarially contaminated streams by proposing SAFER, a training-free framework for robust TTA. SAFER acts as an augmentation wrapper that replaces single-view predictions with a reliability-guided pooled predictor to stabilize online updates. For each test sample, the method generates stochastic augmentations and aggregates their outputs using correlation-weighted pooling combined with outlier detection. An adaptive-mixing extension is also introduced, which adjusts the weighting between original and augmented inputs based on feature disagreement signals to preserve clean performance. The researchers evaluated SAFER on PACS, VLCS, and OfficeHome benchmarks under PGD attacks at various rates. Results indicate that SAFER improves the resilience of TTA methods against adversarial attacks while maintaining competitive accuracy on clean data.

arxiv arXiv cs.LG · 17h ago

Parsimoniously Activated Dictionary Learning Links Sparsity and Storage to Generative Models

The paper introduces parsimoniously activated dictionary learning (PADL), a method imposing global regularization on the number of activated dictionary atoms. It demonstrates that PADL is equivalent to maximum a posteriori estimation under a structured generative model with auxiliary latent variables. This equivalence enables the derivation of generalization guarantees that are difficult to obtain from the original formulation. The authors provide an analytical characterization of the tradeoff between sparsity, storage cost, and reconstruction accuracy. This framework allows for data-driven estimation of optimal hyperparameters without manual tuning. An efficient and interpretable PADL algorithm is developed based on this theoretical connection. Experimental results show improved reconstruction performance under comparable sparsity levels on visual benchmarks. The method also demonstrates practical utility in accelerating inference for vision-language models.

arxiv arXiv cs.LG · 18h ago

Multigrid Training for Molecular Generation using Graph Neural Networks

The authors introduce a multigrid training strategy to address the high computational costs and instability associated with modeling biochemical molecular systems at full resolution. This approach leverages low-resolution optimization to accelerate learning at higher resolutions by transferring parameters across different discretizations. For graph-based molecular representations, the method progressively transfers parameters from a coarse graph to increasingly finer graphs using biased random walk upsampling. In 3D molecular generation, structures are voxelized at multiple resolutions, allowing a coarse-resolution conditional Variational Autoencoder (CVAE) to be pretrained first. Shape-compatible convolutional parameters are then transferred from the coarse model to initialize a fine-resolution CVAE. Numerical experiments on receptor-conditioned 3D ligand generation demonstrate that this method accelerates convergence compared to training from scratch. Additionally, the study shows that multigrid training improves generalization capabilities for molecular generation tasks.

arxiv arXiv cs.LG · 18h ago

HyperAdapter: Structured Hyperedge Adaptation for Parameter-Efficient Fine-Tuning of Vision Transformers

The authors propose HyperAdapter, a novel parameter-efficient fine-tuning method that adapts vision transformers in hyperedge space rather than token space. Existing adapter-based methods typically perform independent adaptations for each token, which overlooks structured relationships and can lead to redundant updates. HyperAdapter constructs a soft hypergraph over ViT tokens using prototype-based assignments to enable group-aware adaptation. The architecture aggregates token features into latent hyperedge representations and applies lightweight bottleneck adaptation at the hyperedge level. Updates are then diffused back to individual tokens via the hypergraph incidence structure, injecting an explicit structural inductive bias. Extensive experiments across diverse visual benchmarks demonstrate that this approach consistently outperforms strong PEFT baselines under comparable parameter budgets. The results highlight significant gains on tasks requiring structured reasoning and suggest that the choice of adaptation space is a critical dimension for efficient transfer.

arxiv arXiv cs.LG · 18h ago

Shift-Invariant Variance Estimator Eliminates Minimization Bias in Local Learning Coefficient Estimation

Singular Learning Theory uses the Local Learning Coefficient to quantify neural network loss landscape geometry, but mean-energy estimators rely on an additive loss baseline. During off-equilibrium training phases, this minimum is unknown, and substituting it with noisy mini-batch losses introduces systematic minimization bias. The authors propose the Shift-Invariant Variance Estimator (SIVE) to structurally eliminate this unknown baseline through the variance operator. By combining SIVE with a correction derived from the Law of Total Variance, the method separates geometric loss fluctuations from evaluation noise. Controlled experiments on analytically tractable toy models demonstrate that SIVE recovers expected finite-temperature geometric signals where anchored mean estimators fail. Applied to deep neural networks, SIVE serves as a robust diagnostic for tracking structural phase transitions throughout training.

arxiv arXiv cs.LG · 18h ago

Efficient CNN with Transfer Learning for Multi-Cancer Detection

A study introduces a lightweight convolutional neural network enhanced with transfer learning for multi-cancer detection using biomedical images. The architecture aims to reduce computational complexity while maintaining high classification performance for deployment in resource-constrained environments. Researchers evaluated the model on three tumor datasets comprising brain MRI and lung and kidney CT scans. The system achieved test accuracies of 90.85%, 98.64%, and 99.92% for brain, lung, and kidney cancer respectively via five-fold stratified cross-validation. Transfer learning was employed by pretraining on one cancer type and fine-tuning on others, requiring only 20 additional epochs to match scratch-trained models. The fine-tuning process updates the classification part of the CNN and takes approximately 0.014 seconds per image per epoch on an NVIDIA GeForce GTX 960. Comparative evaluations demonstrate that this model outperforms state-of-the-art architectures such as Xception, VGG16, VGG19, MobileNetV2, and DenseNet121.

arxiv arXiv cs.LG · 19h ago

P4IR: Reinforcement Learning Enhances Automated Code Compliance Systems

A new framework named P4IR addresses the issue of hallucinated rules in large language model-based automated code compliance systems. This two-stage approach first employs supervised fine-tuning to instill domain knowledge into the model. It then utilizes Group Relative Policy Optimization to improve the accuracy of generated high-level code skeletons. The method achieved reductions of up to 23.8% in tree edit distance and 38.6% in token-level Levenshtein distance compared to supervised fine-tuning baselines. Comparative analysis shows that P4IR outperforms leading models like Claude Opus, GPT-5.2, and Qwen-3-Max in zero-shot settings. Additionally, the reinforcement learning stage produced a statistically significant reduction in false positives. This combination of techniques offers a path toward more reliable automated code compliance.

arxiv arXiv cs.LG · 19h ago

Asymptotic Signal Subspace Recovery in Softmax Attention Models

This study investigates the theoretical principles behind softmax-attention mechanisms by analyzing a stylized model where a query vector is learned via stochastic gradient ascent. The authors exploit the model's symmetry to derive a population objective and characterize the limiting ordinary differential equation governing the learning dynamics. By employing tools from stochastic approximation and dynamical systems theory, they establish a rigorous connection between the stochastic learning algorithm and its deterministic limit. Under suitable high-dimensional scaling assumptions and standard step-size conditions, the research demonstrates that the learned query converges almost surely to the one-dimensional signal subspace. This convergence implies that the query asymptotically recovers the latent informative direction up to an intrinsic sign ambiguity. The findings provide a theoretical foundation for understanding attention as a signal extraction procedure in high-dimensional noisy environments.

arxiv arXiv cs.LG · 19h ago

QeHDC: Hyperdimensional Computing based on Quantum-enhanced binding and SuperClass Construction

The authors propose QeHDC, a novel framework extending classical Hyperdimensional Computing by leveraging quantum mechanical properties for enhanced computational efficiency. This approach utilizes a one-pass training method that employs sinusoidal and quantum encoding to project classical data into quantum amplitude states. A key innovation is the introduction of a reference-state-based quantum binding operation realized through specific quantum circuits. Additionally, the framework implements a density-matrix-based superclass generation strategy using eigenvalue decomposition to extract critical quantum state features. These mechanisms enable more accurate and robust class representations for classification tasks. Experimental evaluations on standard benchmark datasets demonstrate superior performance compared to traditional classical and existing quantum-enhanced methods. The results also highlight the approach's robustness to noise and computational feasibility, suggesting practical benefits for future quantum-inspired paradigms.

arxiv arXiv cs.LG · 19h ago

GaRA: Graph-aware LoRA Generation for Enhancing LLMs on Graph Tasks

Graph neural networks often exhibit limited transferability due to their tight coupling with dataset-specific feature spaces, whereas language models offer flexible generalization through a unified interface. Existing methods for adapting language models to graph tasks struggle to encode whole-graph information, which can lead to significant information loss and suboptimal understanding. To address this limitation, the authors propose GaRA, a novel Graph-aware LoRA generation model that implements a weight-level information injection paradigm. This approach generates task-specific weight updates conditioned on original graph structures, allowing them to interact directly with hidden representations. The method constrains the norm of these generated updates to inject whole-graph information while avoiding optimization bias inherent in standard weight generation. Empirical studies demonstrate that GaRA consistently outperforms baseline methods across various zero-shot graph learning tasks.

arxiv arXiv cs.LG · 19h ago

LLMs Determine Causal Structure via Difference-Making Logic

The article addresses the puzzle of how large language models acquire causal structure despite the limitations of standard formalisms like Judea Pearl's interventionist approach and the Neyman-Rubin framework. It argues that LLMs utilize a specific inductive method known as variational induction, which relies on difference-making logic. During training, models process vast amounts of text from diverse contexts to identify what constitutes a difference-maker or an indifference-maker within word sequences. The analysis examines how architectural components, specifically token embeddings and self-attention mechanisms, facilitate this variational induction process. This logical framework fundamentally parallels the experimental method used in science. In both cases, causal relations are derived by systematically varying individual circumstances to observe their influence on a phenomenon.

arxiv arXiv cs.LG · 19h ago

Escaping the Variance Trap: Jacobian-Free Dynamics for Root-Finding Bilevel Optimization

The authors identify a critical flaw termed the Variance Trap, which arises when stochastic root-finding problems are forced into minimization frameworks via squared residuals. Standard bilevel minimization algorithms require estimating hypergradients involving implicit Jacobians that act as noise amplifiers in stochastic settings. To address this, the paper formalizes Root-Finding Bilevel Optimization (RF-BO) as a distinct problem class to bypass this pathology. A Jacobian-free solution using Two-Time-Scale Stochastic Approximation (TTSA) is proposed to update directly along the root error. The study provides the first non-asymptotic convergence guarantees for TTSA in this setting under Markovian noise. Experiments show a 2.6% top-1 accuracy gain in SimCLR and 17x faster convergence in non-linear ODE control compared to baselines. Additionally, the framework achieves significantly improved entropy stability in reinforcement learning and an 11.1% quality improvement in generative modeling.

arxiv arXiv cs.LG · 19h ago

RQ-TTSA: Distribution-Aware Robust Bilevel Optimization with Quantile-Guided Huber Updates

The authors propose RQ-TTSA, a distribution-aware framework designed to address instability in bilevel optimization caused by heavy-tailed stochastic noise. Unlike existing variance-reduction techniques that rely on myopic magnitude checks, this method uses historical gradient buffers to estimate rolling quantiles for adaptive Huber-style clipping. This approach preserves local optimization geometry while strictly bounding effective variance under nonconvex-strongly convex assumptions with infinite-variance noise. Theoretical analysis derives a convergence rate of O(T^(-(p-1)/(3p-2))) that recovers optimal dependence on the heavy-tailed parameter p. Empirical evaluations across six diverse tasks, including vision benchmarks and offline reinforcement learning, show consistent outperformance over state-of-the-art baselines. RQ-TTSA eliminates divergence spikes and ensures stable convergence with negligible computational overhead of approximately 2.7 percent.

media r/LocalLLaMA · 20h ago

Colony: An Educational Simulation of LLM Attention Mechanisms Using Agent-Based Analogies

Colony is an educational resource designed to explain the attention mechanism of Large Language Models through simple analogies involving agents. The simulation places these agents within a board environment inspired by Conway's Game of Life. Each agent in the system represents a specific role within the self-attention block mechanism of an LLM. This visual approach allows users to observe how information flows and interacts during the attention process. The project is available as an open-source tool for those interested in exploring these concepts without complex mathematics. It serves as a fun and accessible way to understand the internal workings of transformer models.

arxiv arXiv cs.LG · 23h ago

A Differentiable Atari VCS for Explainable AI

A fully differentiable emulator of the Atari 2600 VCS is presented, reproducing all 64 ALE games with bit-for-bit accuracy in RAM and screen output. The system enables gradient-based explainable AI by providing a complex, fully known ground truth, with both Julia and JAX implementations validated against a reference emulator and supporting high-throughput training on GPUs.

arxiv arXiv cs.LG · 23h ago

AdaR: Adaptive Recurrent Message Passing for Graph Test-Time Computing

AdaR enables flexible test-time computing on graphs without parameter changes by using adaptive recurrence. It derives step dependence as a necessary and sufficient condition for convergence and incorporates normalized step information and representation-target relations into recurrent updates, guided by gradient-based supervision signals. Empirical results show AdaR outperforms strong baselines in both inductive and transductive graph learning settings.

arxiv arXiv cs.LG · 23h ago

Speech-Text Models Latently Transcribe Speech in Intermediate Layers

Interleaved speech-language models undergo an implicit transcription phase where spoken words become decodable as text tokens in intermediate layers, despite no speech recognition training. Up to 77% of the data shows the spoken word appearing as a top candidate text prediction, followed by a transition to text-based next-word prediction before returning to speech. This behavior is influenced by interleaved training and text LM initialization, and correlates with spoken knowledge performance.

arxiv arXiv cs.LG · 23h ago

Fed-CausalDiff: Decoupled Synchronization for Federated Do-Simulation

Fed-CausalDiff introduces a federated causal diffusion framework that enables do-simulation and policy evaluation in decentralized settings. It decomposes latent state evolution into global and local components, allowing decoupled synchronisation to reduce communication cost while maintaining accurate causal inference.

arxiv arXiv cs.LG · 23h ago

Prompt-Side Preprocessing Enhances Edge AI Accuracy

A structured prompt framework improves local LLM accuracy in environmental monitoring by transforming raw sensor data into enriched textual representations. Evaluations on indoor and outdoor datasets show local model accuracy increases from 50.9% to 81.7% indoors and from 63.7% to 89.3% outdoors with enriched prompts, while maintaining low latency near 0.22 seconds in no-chain-of-thought mode.