Researchers propose PhysMani, a framework that couples a physics-principled 3D Gaussian world model with a future-aware action policy model to address challenges in manipulating fast-moving targets in unstructured 3D environments.

  • The world model learns a divergence-free Gaussian velocity field via online optimization for physically grounded future dynamics prediction.
  • The policy model integrates predicted 3D scene future dynamics through a learnable token-based cross-attention module.
  • The authors introduce PhysMani-Bench, a dynamic manipulation benchmark consisting of 16 tasks.
  • PhysMani demonstrates a superior success rate over strong baselines in both simulation and real-world robot experiments.

This approach provides accurate 3D geometry and physically meaningful forecasting for embodied AI systems.