Allen AI — korshunov.ai

Lab · Allen AI

ROVE enables humanoid Vision-Language-Action models to learn effective manipulation behaviors using imperfect human interventions. It combines a human-in-the-loop data collection pipeline with Optimistic Value Estimation and cross-embodiment supervision to prioritize high-value actions and improve robustness. ROVE outperforms baseline methods on real-world, contact-rich manipulation tasks through iterative rollout and intervention cycles.

arxiv arXiv cs.LG · 9d ago

HABC Improves RL Fine-Tuning of VLAs with Sparse Outcomes

Hierarchical Advantage-Weighted Behavior Cloning (HABC) enhances online RL fine-tuning of vision-language agents by using separate critic heads for viability and efficiency. It combines their outputs via a state-adaptive gate and applies per-transition weights, while intervention-aware credit assignment prevents supervision leakage. In real-robot experiments, HABC boosts success rates to 92%, 88%, and 38% on three bimanual tasks, surpassing SFT baselines of 36%, 44%, and 12%.

arxiv arXiv cs.LG · 8d ago

Deep Reinforcement Learning for Minimum Zero-Forcing Sets

This paper proposes SD-ZFS, a deep reinforcement learning framework adapted from S2V-DQN, to solve the NP-hard minimum zero-forcing set problem on undirected graphs. The framework demonstrates strong performance compared to optimal solutions and greedy heuristics, showing effective generalization, scalability, and transfer across diverse graph structures.

arxiv arXiv cs.LG · 8d ago

Learning Fair Pareto-Optimal Policies in Multi-Objective Reinforcement Learning

The paper introduces a framework for multi-policy multi-objective reinforcement learning that learns a set of Pareto-optimal policies ensuring fairness across diverse user preferences. It proves fair policies remain within the convex coverage set for concave welfare functions and proposes three algorithms that incorporate non-stationary and stochastic policy dynamics. Empirical results show these methods effectively learn fair policies adaptable to varying user preferences.

arxiv arXiv cs.AI · 8d ago

EAGG: Embodiment-Aligned Grasp Generation via Geometry-Aware Graph Conditioning

EAGG introduces a grasp generator that aligns embodiment structure within a shared model using topology-aware graphs and geometry-aware tokens. It achieves 56.17% average grasp success on MultiGripperGrasp, matching specialized models within 1.10 percentage points and reducing median contact distance from 0.239 cm to 0.189 cm.

arxiv arXiv cs.AI · 8d ago

Learning Fair Pareto-Optimal Policies in Multi-Objective Reinforcement Learning

The paper introduces a framework for multi-policy multi-objective reinforcement learning that learns a set of Pareto-optimal policies ensuring fairness across diverse user preferences. It proves fair policies remain within the convex coverage set for concave welfare functions like GGF and proposes three algorithms that incorporate non-stationary and stochastic policies to adapt to historical inequities. Empirical results show these methods effectively learn fair policies across multiple domains.

media Interconnects · 9d ago

Frontier Post-Training Recipe Review with Finbarr Timbers

The podcast reviews the evolution of post-training recipes in large language models, from InstructGPT to 2026 frontier models. It highlights Multi-Teacher On-Policy Distillation (MOPD) as the dominant pattern, where domain-specialist models are trained and then distilled into a general student model via on-policy distillation, scaling to over 10 teachers in models like DeepSeek V4 and Nemotron 3 Ultra.

arxiv arXiv cs.AI · 9d ago

Unified Causal-Origin Taxonomy for Distributional Shifts in RL

This paper introduces a unified causal-origin taxonomy that categorizes distributional shifts in reinforcement learning into internal, agent-driven, and external, environment-driven sources. It unifies ID/OOD generalization and non-stationary settings by framing shifts as structured changes in the agent-environment interaction process, using a POMDP decomposition and a shifted-time boundary perspective.

arxiv arXiv cs.AI · 9d ago

CircuitLasso: Scalable Circuit Learning for LLM Interpretability

CircuitLasso proposes a scalable method for learning sparse circuits in large language models using sparse linear regression. It achieves structural accuracy comparable to state-of-the-art intervention-based methods at significantly lower computational cost, while enabling efficient discovery of semantic feature propagation and improving performance on domain-generalization tasks with reduced cost.

arxiv arXiv cs.LG · 9d ago

Unified Causal-Origin Taxonomy of Distributional Shifts in RL

This paper proposes a unified causal-origin taxonomy for distributional shifts in reinforcement learning, linking ID/OOD generalization to non-stationary settings. It decomposes the agent-environment interaction using a POMDP framework, identifying internal, agent-driven, and external, environment-driven shifts, with explicit, implicit, and hybrid types defined by the shifted-time boundary. The work introduces an evaluation framework to measure shift impact through performance degradation and recovery metrics, enabling systematic analysis of RL robustness.

arxiv arXiv cs.LG · 9d ago

CircuitLasso: Scalable Circuit Learning for LLM Interpretability

CircuitLasso enables scalable circuit learning in large language models by using sparse linear regression. It recovers circuits with structural accuracy matching state-of-the-art methods at significantly lower computational cost, and demonstrates human-interpretable semantic propagation through model components. The learned circuits achieve comparable performance on a domain-generalization task with reduced cost.

ROVE: Reinforcement Learning with Human Interventions for Humanoid Manipulation

HABC Improves RL Fine-Tuning of VLAs with Sparse Outcomes

Deep Reinforcement Learning for Minimum Zero-Forcing Sets

Learning Fair Pareto-Optimal Policies in Multi-Objective Reinforcement Learning

EAGG: Embodiment-Aligned Grasp Generation via Geometry-Aware Graph Conditioning

Learning Fair Pareto-Optimal Policies in Multi-Objective Reinforcement Learning

Frontier Post-Training Recipe Review with Finbarr Timbers

Unified Causal-Origin Taxonomy for Distributional Shifts in RL

CircuitLasso: Scalable Circuit Learning for LLM Interpretability

Unified Causal-Origin Taxonomy of Distributional Shifts in RL

CircuitLasso: Scalable Circuit Learning for LLM Interpretability