w$^{2}$VLA introduces a modular vision-language-action model that decouples declarative and procedural knowledge. By restructuring information flow, it enables robust behavior cloning and zero-shot skill transfer to novel, dissimilar objects.
Decoupling Declarative and Procedural Knowledge in Vision-Language-Action Models
from English