Robotics — korshunov.ai

Robotics

Reward-Petri-Net Interpretation of Temporal Behavior Trees

This paper presents a Reward-Petri-Net interpretation of Temporal Behavior Trees for reinforcement learning. It translates TBTs into Petri Nets, assigning rewards based on structural constraints defined in Linear Temporal Logic, enabling effective learning in complex, long-horizon robotic tasks where vanilla RL fails.

arxiv arXiv cs.CL · 2d ago

A Taxonomy of Conceptual Alignment in Human-Robot Dialogue

The paper proposes a design-centric taxonomy for conceptual alignment in human-robot dialogue, defining it as a bidirectional, co-constructive process. It introduces a dialogue act schema to capture interactional moves that enable alignment, offering a structured framework for analyzing and designing such interactions.

lab NVIDIA Technical Blog · 3d ago

NVIDIA Launches Halos for Robotics: Full-Stack Functional Safety System

NVIDIA has introduced Halos for Robotics, a full-stack functional safety system designed for physical AI. It enables AI-driven safety in unstructured environments where robots operate autonomously alongside humans in factories, warehouses, hospitals, and homes.

arxiv arXiv cs.AI · 6d ago

Frequency-Aware Flow Matching for Robotic Action Generation

Frequency-Aware Flow Matching (FAFM) enables continuous and temporally consistent robotic action generation by transforming discrete action sequences into the frequency domain using discrete cosine transform. It regularizes first-order temporal derivatives with a Sobolev-type constraint to ensure smooth actions, improving success rates, motion smoothness, and robustness across synthetic and real-world tasks without adding network parameters.

arxiv arXiv cs.AI · 6d ago

FlowMaps Models Long-Term Multimodal Object Dynamics

FlowMaps is a latent flow matching model that predicts future object locations in 3D environments by learning spatio-temporal patterns from human interactions. It outperforms state-of-the-art methods in dynamic object navigation across over 600 episodes in both simulated and real-world settings.

arxiv arXiv cs.AI · 6d ago

Finetuning VLA Models Requires Fewer Layers Than Thought

Vision-Language-Action models show severe layer-wise redundancy despite large parameter counts. A training-free compression method using Centered Kernel Alignment removes twin layers, reducing model depth by up to 50% and enabling 40-50% faster training and up to 30% faster inference without performance loss, validated across simulation and real-world robotic tasks.

arxiv arXiv cs.AI · 7d ago

Hardware-validated vision-in-the-loop for maritime UAV autonomy

A deep monocular pose estimator processes rendered maritime environments in real time, fused with IMU data via a delayed Kalman filter. The system enables autonomous indoor flight with perception latency and computational constraints, validating maritime UAV autonomy safely before shipboard deployment.

arxiv arXiv cs.AI · 7d ago

Robot Uses Prior Team Experience to Improve USAR Rescue Success

A robot initialized with a selected prior collaboration pattern improved rescue success from 25.7% to 41.3% in urban search and rescue tasks. This enhancement reduced average task time by 283 seconds, with the greatest benefits observed at the start of interactions, indicating effective early task knowledge transfer through episodic memory.

arxiv arXiv cs.LG · 8d ago

Reversal Q-Learning: A New Off-Policy RL Algorithm

Reversal Q-Learning (RQL) is a new off-policy reinforcement learning algorithm that trains a flow policy using prior data. By modeling flow refinement steps as actions in an expanded Markov decision process and applying virtual on-policy trajectories via reversal, RQL enables effective offline learning without backpropagation through time. Experiments on 50 robotic tasks show RQL achieves the best average performance among state-of-the-art flow-based offline RL methods.

arxiv arXiv cs.LG · 8d ago

Qwen-RobotManip Achieves Generalization in Robotic Manipulation

Qwen-RobotManip, a Vision-Language-Action foundation model, enables large-scale training through unified alignment across representation, motion, and behavior. It uses open-source data to build a 38,100-hour pretraining corpus and demonstrates emergent generalization, outperforming prior state-of-the-art models in out-of-distribution settings and ranking first in RoboChallenge with a 20% relative improvement on real-robot platforms.

arxiv arXiv cs.AI · 8d ago

EAGG: Embodiment-Aligned Grasp Generation via Geometry-Aware Graph Conditioning

EAGG introduces a grasp generator that aligns embodiment structure within a shared model using topology-aware graphs and geometry-aware tokens. It achieves 56.17% average grasp success on MultiGripperGrasp, matching specialized models within 1.10 percentage points and reducing median contact distance from 0.239 cm to 0.189 cm.

arxiv arXiv cs.AI · 8d ago

Visual Verification Enables Inference-time Steering and Autonomous Policy Improvement

VERITAS introduces a generator-verifier framework that enables robots to improve policies in real time without additional training. A visual verifier evaluates actions at inference time, allowing consistent performance gains through verified rollouts that serve as effective supervision for offline policy improvement. Post-training with these verified rollouts matches expert demonstrations in efficiency, without human intervention.

media r/LocalLLaMA · 9d ago

Qwen Robot Suite Announced

Aliyun has launched the Qwen Robot Suite, a new set of AI-powered robotic tools. The suite aims to enable developers to build and deploy intelligent robots with enhanced capabilities.

arxiv arXiv cs.LG · 9d ago

Task-Error Residual Learning for Real-Robot Five-Ball Juggling

A residual learning approach using directional task-error supervision achieves stable five-ball juggling on real robots, converging from the second attempt. The system outperforms human practice timelines and relies on both directional feedback and an informative prior, with a fixed-Jacobian Newton update proving most reliable.

arxiv arXiv cs.LG · 9d ago

ROVE: Reinforcement Learning with Human Interventions for Humanoid Manipulation

ROVE enables humanoid Vision-Language-Action models to learn effective manipulation behaviors using imperfect human interventions. It combines a human-in-the-loop data collection pipeline with Optimistic Value Estimation and cross-embodiment supervision to prioritize high-value actions and improve robustness. ROVE outperforms baseline methods on real-world, contact-rich manipulation tasks through iterative rollout and intervention cycles.

arxiv arXiv cs.LG · 9d ago

Geometric Action Model for Robot Policy Learning

The Geometric Action Model (GAM) enables robot policies to reason about 3D physical interactions by repurposing a pretrained geometric foundation model. GAM splits the GFM to serve as both an observation encoder and a causal future predictor, then routes predicted future geometry and actions through the same backbone, achieving accurate, robust, and efficient manipulation performance in simulation and real-robot benchmarks.