RE4: Transformation-aware Imitation of Object Interactions Using Manipulation Modes

This paper introduces RE4, a framework for imitation learning that combines principled manipulation theories with modern benchmarks to preserve both performance and interpretability in object interaction tasks. The approach utilizes lightweight, self-supervised pose estimation and mode-aware transformations to retrieve and replan demonstrations effectively.

Proposes lightweight training for model-free pose estimation of target objects using self-supervision over demonstration data.
Implements a manipulation mode-aware retrieval of demonstrations to inform the learning process.
Applies mode-aware transformation and a replan step that connects to the retrieval point while preserving mode constraints.
Evaluates the framework on state-based and image-based benchmarks in Push-T and Robomimic, including an adversarial benchmark for sparse data regions.

The work demonstrates the promise of using simple interpretable building blocks to learn manipulation skills, showing robustness in low-data regimes.