RECALL: Active Lifelong Learning for Vision-Language-Action Models
The paper introduces RECALL, an active, continual learning paradigm for Vision-Language-Action models that addresses the inefficiencies of passive imitation learning. Unlike traditional methods that require robot failures to trigger data collection, this approach uses uncertainty-guided recovery demonstrations to proactively identify states needing supervision. The authors demonstrate that this targeted data collection leads to more efficient fine-tuning compared to passively collected demonstrations. However, the study reveals that fine-tuning exclusively on this active recovery data causes catastrophic forgetting of previously learned behaviors. To mitigate this issue, the work evaluates continual learning techniques such as replay-based data mixing and elastic weight consolidation. These experiments highlight the critical tradeoffs between plasticity for new tasks and retention of existing capabilities in autoregressive VLAs. Ultimately, the research establishes that while uncertainty-guided recovery improves adaptation efficiency, incorporating targeted new data into large robot policies presents significant open challenges.