Reasoning models
arxiv arXiv cs.LG · 8d ago

Reversal Q-Learning: A New Off-Policy RL Algorithm

Reversal Q-Learning (RQL) is a new off-policy reinforcement learning algorithm that trains a flow policy using prior data. By modeling flow refinement steps as actions in an expanded Markov decision process and applying virtual on-policy trajectories via reversal, RQL enables effective offline learning without backpropagation through time. Experiments on 50 robotic tasks show RQL achieves the best average performance among state-of-the-art flow-based offline RL methods.

arxiv arXiv cs.LG · 8d ago

Credit-in-Event: Re-Anchoring Event Credit in Dynamics Models

A new method called Credit-in-Event identifies and addresses temporal credit dilution in learned dynamics models. CREST, a label-free and training-free readout, re-anchors pooled representations by estimating transient event cores and applying event-versus-rest contrast, reducing out-of-distribution error across multiple systems and data types. Ablations confirm the improvement stems from event-core credit re-anchoring, not generic locality or stability priors.

arxiv arXiv cs.LG · 8d ago

Fairness in Graph Neural Networks via Laplacian Adaptation

A new framework modifies the Laplacian operator in graph diffusion to enhance fairness by incorporating subspace projections, spectral adjustments, and frequency-based filtering. The method leverages graph diffusion's smoothing properties to mitigate bias, with theoretical analysis and empirical validation on synthetic and real-world datasets showing improved fairness without significant computational overhead.

arxiv arXiv cs.LG · 8d ago

Vision-language models don't always need images for chest X-ray accuracy

A causal audit shows that many vision-language models achieve high chest radiograph accuracy without using images. Text-only models match multimodal models in performance and outperform them in grounding, with accuracy and confidence flags only appearing when image use occurs. These findings suggest that accuracy alone is insufficient to validate clinical deployment, and grounding must be assessed.

arxiv arXiv cs.LG · 8d ago

Blind Recovery of Latent Domains via Unsupervised Symmetry Discovery

The paper proposes an unsupervised framework to recover latent domains and signals from corrupted observations by discovering data symmetries. It models observations as linear measurements of signals from a latent random field and uses a shallow group-convolutional network with stationarity and locality constraints to learn latent symmetry actions and filters, enabling recovery from unstructured data.