Training methods
arxiv arXiv cs.LG · 1d ago

Robust Diffusion Models via Divergence-Induced Weighted Denoising

A new training method replaces MSE loss in diffusion models with an f-divergence-based transformation, creating a robust surrogate that improves performance under data contamination. The approach uses local divergence constructions under DDPM's Gaussian reverse-kernel, reducing the training objective to a one-dimensional function of denoising error, with bounded-influence divergences suppressing large errors and enhancing stability.

arxiv arXiv cs.LG · 1d ago

Introducing Quantum Measurement Temperature to Stabilize Hybrid QNN Training

A learnable scaling parameter called Quantum Measurement Temperature (QMT) is introduced to rescale quantum measurement outputs in hybrid quantum neural networks. This approach mitigates measurement-induced logit contraction, enhancing gradient magnitude and stability during training without altering the quantum circuit or measurement operators. Experiments show improved logit separation, gradient strength, and classification accuracy in protein and image classification tasks.

arxiv arXiv cs.LG · 1d ago

Stationary Robust Mean-Field Games under Model Mismatches

This paper introduces a stationary mean-field game framework that directly incorporates distributional model uncertainty into population-coupled dynamics. It establishes a robust dynamic programming principle, proves existence of a stationary robust equilibrium, and presents the first algorithm with convergence guarantees. The mean-field solution approximates finite-population equilibria and provides explicit non-asymptotic error bounds under model uncertainty.

arxiv arXiv cs.AI · 1d ago

Sparsity-Storage-Accuracy Tradeoff in Parsimoniously Activated Dictionary Learning

Parsimoniously activated dictionary learning (PADL) establishes a structured generative model with auxiliary latent variables, enabling maximum a posteriori estimation. This framework provides generalization guarantees and an analytical characterization of the tradeoff between sparsity, storage cost, and reconstruction accuracy, allowing data-driven hyperparameter estimation. The resulting algorithm achieves better reconstruction performance and accelerates inference in vision-language models.

arxiv arXiv cs.AI · 1d ago

HyperAdapter: Structured Hyperedge Adaptation for Vision Transformer Fine-Tuning

HyperAdapter introduces a hypergraph-based adapter that performs structured, group-aware adaptation in vision transformers by operating in hyperedge space rather than token space. It uses prototype-based assignments to build a soft hypergraph, aggregates token features into hyperedge representations, applies lightweight adaptation, and diffuses updates back via hypergraph structure, enabling explicit structural inductive bias while maintaining efficiency. Experiments show consistent performance gains over baseline PEFT methods, especially on tasks requiring structured reasoning.

arxiv arXiv cs.AI · 1d ago

P4IR Framework Improves LLM-Based Code Compliance Accuracy

P4IR, a two-stage framework, uses supervised fine-tuning and Group Relative Policy Optimization to enhance large language model-based automated code compliance systems. It reduces tree edit and token-level Levenshtein distances by up to 23.8% and 38.6% respectively, outperforming leading LLMs like Claude Opus, GPT-5.2, and GLM-4.7 in zero-shot settings with few-shot prompting, and reduces false positives by a small but statistically significant margin.

arxiv arXiv cs.LG · 1d ago

BIPC Framework Accelerates Mixed-Integer Optimization with Machine Learning

The BIPC framework reduces solution time for large-scale mixed-integer programs by identifying a backdoor subset of variables that drive computational complexity. Using supervised learning, it predicts backdoor variable values and intervals, then solves a reduced problem with these predictions, achieving significant speedups with minimal quality loss. This enables rapid, high-quality solutions under parameter perturbations in real-world systems like power and supply chains.

arxiv arXiv cs.LG · 2d ago

Muon Optimizer: Power, Limits, and a River-Valley Theory

A new trajectory-level theory reveals Muon accelerates early in optimization along the information-bearing river direction but converges slowly near the bottom, unlike gradient descent. With momentum, Muon's orthogonalized updates remove residual scale information, leading to overshooting and oscillation. The study advocates a two-stage approach—using Muon early and switching to gradient descent-like optimizers later—for improved LLM training performance.