Topic · Evaluation & benchmarks
arxiv arXiv cs.LG · 9d ago

HABC Improves RL Fine-Tuning of VLAs with Sparse Outcomes

Hierarchical Advantage-Weighted Behavior Cloning (HABC) enhances online RL fine-tuning of vision-language agents by using separate critic heads for viability and efficiency. It combines their outputs via a state-adaptive gate and applies per-transition weights, while intervention-aware credit assignment prevents supervision leakage. In real-robot experiments, HABC boosts success rates to 92%, 88%, and 38% on three bimanual tasks, surpassing SFT baselines of 36%, 44%, and 12%.

arxiv arXiv cs.AI · 9d ago

Variance in LLM Circuit Discovery: Causes and Mitigations

This paper analyzes variance in circuit discovery for large language models, identifying resampling, rephrasing, and sample-wise variance. It shows CEAP reduces resampling variance and argues rephrasing variance stems from prompt templates activating different circuits, implying LLMs may be inherently hard to steer. The study also finds sparsity does not resolve these issues and that sample-wise variance is largely benign due to selective contribution scaling affecting unfaithfulness scores.

arxiv arXiv cs.AI · 9d ago

MA-SBI: Calibration-Free SBI via Side-Channel Guidance

MA-SBI introduces a calibration-free simulation-based inference framework that uses side-channel text, like regime labels or instructions, to correct for simulator misspecification. It employs a learned corrector to apply observation-space shifts before posterior inference, without needing ground-truth parameter pairs or retraining. On hide-the-calibration benchmarks, MA-SBI matches the oracle posterior with text alone, outperforming RoPE under limited data, and shows robustness on real-world epidemiological and cognitive-science datasets.

arxiv arXiv cs.AI · 9d ago

Causal Model of Theory of Mind in AI Conflict

This paper proposes a structural causal model using a directed acyclic graph to define when Theory of Mind engagement is causally warranted in human-machine conflict. The model identifies four exogenous conditions, five mediators, and three causal pathways for ToM activation, with epistemic accuracy as the primary outcome. It offers a resource-rational framework for AI social reasoning, validated through simulation and human-machine studies.

arxiv arXiv cs.AI · 9d ago

Bayesian Audits Reveal Inconsistent AI Evaluation Timelines

Public AI evaluation archives show that a single terminal result can arise from two distinct pre-terminal histories, with estimated times to reach 95% of performance ceilings at 23.03 or 75.13. A candidate selection-aware frontier model fails synthetic recovery and uncertainty calibration, and is rejected by fixed audit gates. An archive-and-adjudication protocol verifies timing boundaries and falsifies unsupported frontier claims.