This paper introduces RE4, a framework for imitation learning that combines principled manipulation theories with modern benchmarks to preserve both performance and interpretability in object interaction tasks. The approach utilizes lightweight, self-supervised pose estimation and mode-aware transformations to retrieve and replan demonstrations effectively.
- Proposes lightweight training for model-free pose estimation of target objects using self-supervision over demonstration data.
- Implements a manipulation mode-aware retrieval of demonstrations to inform the learning process.
- Applies mode-aware transformation and a replan step that connects to the retrieval point while preserving mode constraints.
- Evaluates the framework on state-based and image-based benchmarks in Push-T and Robomimic, including an adversarial benchmark for sparse data regions.
The work demonstrates the promise of using simple interpretable building blocks to learn manipulation skills, showing robustness in low-data regimes.