Training methods
arxiv arXiv cs.CL · 10d ago

Contrastive-Difference CKA Reveals Concept-Specific Alignment Across LLM Architectures

A training-free diagnostic, contrastive-difference CKA (CKA_Delta), identifies concept-specific structural alignment across language model architectures. It detects geometric convergence and functional transfer across six concept domains, including non-instructional tasks, with significant discrimination where standard CKA fails. Results suggest universality may strengthen with model scale, though further validation is needed.

arxiv arXiv cs.AI · 10d ago

MA-SBI: Calibration-Free SBI via Side-Channel Guidance

MA-SBI introduces a calibration-free simulation-based inference framework that uses side-channel text, like regime labels or instructions, to correct for simulator misspecification. It employs a learned corrector to apply observation-space shifts before posterior inference, without needing ground-truth parameter pairs or retraining. On hide-the-calibration benchmarks, MA-SBI matches the oracle posterior with text alone, outperforming RoPE under limited data, and shows robustness on real-world epidemiological and cognitive-science datasets.

arxiv arXiv cs.LG · 10d ago

Adaptive Functional Gradient Descent with Convergence Guarantees

We propose a new functional gradient descent algorithm that adapts its representation during optimization. The method achieves convergence to a stationary point under smooth losses and to a global minimizer under smoothness and a Polyak-Lojasiewicz condition, despite using finite-dimensional approximations. It outperforms both fixed-approximation FGD and neural network baselines in regression, PDE solving, and computer vision tasks.

arxiv arXiv cs.LG · 10d ago

Unified Causal-Origin Taxonomy of Distributional Shifts in RL

This paper proposes a unified causal-origin taxonomy for distributional shifts in reinforcement learning, linking ID/OOD generalization to non-stationary settings. It decomposes the agent-environment interaction using a POMDP framework, identifying internal, agent-driven, and external, environment-driven shifts, with explicit, implicit, and hybrid types defined by the shifted-time boundary. The work introduces an evaluation framework to measure shift impact through performance degradation and recovery metrics, enabling systematic analysis of RL robustness.

arxiv arXiv cs.LG · 10d ago

Post-Hoc Falsification Operators Fail to Improve Accuracy in Small Code Models

A measurement study finds that 26 semantic post-hoc operators do not improve held-out accuracy over Best-of-N in frozen small code models. While some operators reduce compute usage or recover correct programs, none outperform BoN in accuracy, due to systemic limitations like coverage walls and consensus traps. An expression-layer recovery (M1) improves performance on HumanEval+ by 12 tasks, with no harm or leakage, and shows consistent results across model cells.

arxiv arXiv cs.LG · 10d ago

HABC Improves RL Fine-Tuning of VLAs with Sparse Outcomes

Hierarchical Advantage-Weighted Behavior Cloning (HABC) enhances online RL fine-tuning of vision-language agents by using separate critic heads for viability and efficiency. It combines their outputs via a state-adaptive gate and applies per-transition weights, while intervention-aware credit assignment prevents supervision leakage. In real-robot experiments, HABC boosts success rates to 92%, 88%, and 38% on three bimanual tasks, surpassing SFT baselines of 36%, 44%, and 12%.