Research paper
arxiv arXiv cs.LG · 20h ago

P4IR: Reinforcement Learning Enhances Automated Code Compliance Systems

A new framework named P4IR addresses the issue of hallucinated rules in large language model-based automated code compliance systems. This two-stage approach first employs supervised fine-tuning to instill domain knowledge into the model. It then utilizes Group Relative Policy Optimization to improve the accuracy of generated high-level code skeletons. The method achieved reductions of up to 23.8% in tree edit distance and 38.6% in token-level Levenshtein distance compared to supervised fine-tuning baselines. Comparative analysis shows that P4IR outperforms leading models like Claude Opus, GPT-5.2, and Qwen-3-Max in zero-shot settings. Additionally, the reinforcement learning stage produced a statistically significant reduction in false positives. This combination of techniques offers a path toward more reliable automated code compliance.

arxiv arXiv cs.LG · 20h ago

Asymptotic Signal Subspace Recovery in Softmax Attention Models

This study investigates the theoretical principles behind softmax-attention mechanisms by analyzing a stylized model where a query vector is learned via stochastic gradient ascent. The authors exploit the model's symmetry to derive a population objective and characterize the limiting ordinary differential equation governing the learning dynamics. By employing tools from stochastic approximation and dynamical systems theory, they establish a rigorous connection between the stochastic learning algorithm and its deterministic limit. Under suitable high-dimensional scaling assumptions and standard step-size conditions, the research demonstrates that the learned query converges almost surely to the one-dimensional signal subspace. This convergence implies that the query asymptotically recovers the latent informative direction up to an intrinsic sign ambiguity. The findings provide a theoretical foundation for understanding attention as a signal extraction procedure in high-dimensional noisy environments.

arxiv arXiv cs.LG · 20h ago

QeHDC: Hyperdimensional Computing based on Quantum-enhanced binding and SuperClass Construction

The authors propose QeHDC, a novel framework extending classical Hyperdimensional Computing by leveraging quantum mechanical properties for enhanced computational efficiency. This approach utilizes a one-pass training method that employs sinusoidal and quantum encoding to project classical data into quantum amplitude states. A key innovation is the introduction of a reference-state-based quantum binding operation realized through specific quantum circuits. Additionally, the framework implements a density-matrix-based superclass generation strategy using eigenvalue decomposition to extract critical quantum state features. These mechanisms enable more accurate and robust class representations for classification tasks. Experimental evaluations on standard benchmark datasets demonstrate superior performance compared to traditional classical and existing quantum-enhanced methods. The results also highlight the approach's robustness to noise and computational feasibility, suggesting practical benefits for future quantum-inspired paradigms.

arxiv arXiv cs.LG · 20h ago

GaRA: Graph-aware LoRA Generation for Enhancing LLMs on Graph Tasks

Graph neural networks often exhibit limited transferability due to their tight coupling with dataset-specific feature spaces, whereas language models offer flexible generalization through a unified interface. Existing methods for adapting language models to graph tasks struggle to encode whole-graph information, which can lead to significant information loss and suboptimal understanding. To address this limitation, the authors propose GaRA, a novel Graph-aware LoRA generation model that implements a weight-level information injection paradigm. This approach generates task-specific weight updates conditioned on original graph structures, allowing them to interact directly with hidden representations. The method constrains the norm of these generated updates to inject whole-graph information while avoiding optimization bias inherent in standard weight generation. Empirical studies demonstrate that GaRA consistently outperforms baseline methods across various zero-shot graph learning tasks.

arxiv arXiv cs.LG · 20h ago

LLMs Determine Causal Structure via Difference-Making Logic

The article addresses the puzzle of how large language models acquire causal structure despite the limitations of standard formalisms like Judea Pearl's interventionist approach and the Neyman-Rubin framework. It argues that LLMs utilize a specific inductive method known as variational induction, which relies on difference-making logic. During training, models process vast amounts of text from diverse contexts to identify what constitutes a difference-maker or an indifference-maker within word sequences. The analysis examines how architectural components, specifically token embeddings and self-attention mechanisms, facilitate this variational induction process. This logical framework fundamentally parallels the experimental method used in science. In both cases, causal relations are derived by systematically varying individual circumstances to observe their influence on a phenomenon.

arxiv arXiv cs.LG · 20h ago

Escaping the Variance Trap: Jacobian-Free Dynamics for Root-Finding Bilevel Optimization

The authors identify a critical flaw termed the Variance Trap, which arises when stochastic root-finding problems are forced into minimization frameworks via squared residuals. Standard bilevel minimization algorithms require estimating hypergradients involving implicit Jacobians that act as noise amplifiers in stochastic settings. To address this, the paper formalizes Root-Finding Bilevel Optimization (RF-BO) as a distinct problem class to bypass this pathology. A Jacobian-free solution using Two-Time-Scale Stochastic Approximation (TTSA) is proposed to update directly along the root error. The study provides the first non-asymptotic convergence guarantees for TTSA in this setting under Markovian noise. Experiments show a 2.6% top-1 accuracy gain in SimCLR and 17x faster convergence in non-linear ODE control compared to baselines. Additionally, the framework achieves significantly improved entropy stability in reinforcement learning and an 11.1% quality improvement in generative modeling.

arxiv arXiv cs.LG · 21h ago

RQ-TTSA: Distribution-Aware Robust Bilevel Optimization with Quantile-Guided Huber Updates

The authors propose RQ-TTSA, a distribution-aware framework designed to address instability in bilevel optimization caused by heavy-tailed stochastic noise. Unlike existing variance-reduction techniques that rely on myopic magnitude checks, this method uses historical gradient buffers to estimate rolling quantiles for adaptive Huber-style clipping. This approach preserves local optimization geometry while strictly bounding effective variance under nonconvex-strongly convex assumptions with infinite-variance noise. Theoretical analysis derives a convergence rate of O(T^(-(p-1)/(3p-2))) that recovers optimal dependence on the heavy-tailed parameter p. Empirical evaluations across six diverse tasks, including vision benchmarks and offline reinforcement learning, show consistent outperformance over state-of-the-art baselines. RQ-TTSA eliminates divergence spikes and ensures stable convergence with negligible computational overhead of approximately 2.7 percent.

media r/LocalLLaMA · 21h ago

Colony: An Educational Simulation of LLM Attention Mechanisms Using Agent-Based Analogies

Colony is an educational resource designed to explain the attention mechanism of Large Language Models through simple analogies involving agents. The simulation places these agents within a board environment inspired by Conway's Game of Life. Each agent in the system represents a specific role within the self-attention block mechanism of an LLM. This visual approach allows users to observe how information flows and interacts during the attention process. The project is available as an open-source tool for those interested in exploring these concepts without complex mathematics. It serves as a fun and accessible way to understand the internal workings of transformer models.

arxiv arXiv cs.LG · 1d ago

AdaR: Adaptive Recurrent Message Passing for Graph Test-Time Computing

AdaR enables flexible test-time computing on graphs without parameter changes by using adaptive recurrence. It derives step dependence as a necessary and sufficient condition for convergence and incorporates normalized step information and representation-target relations into recurrent updates, guided by gradient-based supervision signals. Empirical results show AdaR outperforms strong baselines in both inductive and transductive graph learning settings.

arxiv arXiv cs.LG · 1d ago

Speech-Text Models Latently Transcribe Speech in Intermediate Layers

Interleaved speech-language models undergo an implicit transcription phase where spoken words become decodable as text tokens in intermediate layers, despite no speech recognition training. Up to 77% of the data shows the spoken word appearing as a top candidate text prediction, followed by a transition to text-based next-word prediction before returning to speech. This behavior is influenced by interleaved training and text LM initialization, and correlates with spoken knowledge performance.

arxiv arXiv cs.LG · 1d ago

The Scissors Effect: Resize Diversity Hurts Robust Surrogate Transfer

Input diversity, a common practice in transfer attacks, improves success on standard surrogates but reduces it on robust ones. This regime-dependent effect, called the Scissors Effect, is driven by gradient geometry, with resize operations degrading alignment in robust models. A training-free rule (CG-DI) adjusts diversity based on local gradient consistency to preserve attack success across surrogate types.

arxiv arXiv cs.LG · 1d ago

Generative Robust Optimisation Framework

Generative Robust Optimisation (GRO) introduces a deep generative model to define uncertainty sets, capturing nonlinear correlations, asymmetry, and multimodality. A five-point evaluation framework assesses neural network-based uncertainty sets across reconstruction fidelity, distribution matching, latent regularity, robust relevance, and computational tractability, with experiments validating GRO's effectiveness in production planning and facility location problems.

arxiv arXiv cs.LG · 1d ago

Introducing Quantum Measurement Temperature to Stabilize Hybrid QNN Training

A learnable scaling parameter called Quantum Measurement Temperature (QMT) is introduced to rescale quantum measurement outputs in hybrid quantum neural networks. This approach mitigates measurement-induced logit contraction, enhancing gradient magnitude and stability during training without altering the quantum circuit or measurement operators. Experiments show improved logit separation, gradient strength, and classification accuracy in protein and image classification tasks.

arxiv arXiv cs.LG · 1d ago

Deep material network for homogenization of piezoelectric composites

A piezoelectric deep material network (PDMN) is proposed to efficiently homogenize two-phase piezoelectric composites. The framework embeds electromechanical homogenization relations into its architecture, enabling physics-informed, semi-analytical predictions with over three orders of magnitude lower computational cost than direct numerical simulation, validated on PVDF-LiNbO3 and viscoelastic-piezoelectric composites under nonlinear loading.

arxiv arXiv cs.LG · 1d ago

Stationary Robust Mean-Field Games under Model Mismatches

This paper introduces a stationary mean-field game framework that directly incorporates distributional model uncertainty into population-coupled dynamics. It establishes a robust dynamic programming principle, proves existence of a stationary robust equilibrium, and presents the first algorithm with convergence guarantees. The mean-field solution approximates finite-population equilibria and provides explicit non-asymptotic error bounds under model uncertainty.