Researchers propose NoContactNoWorries, a transformer-based framework that infers binary contact states during in-hand manipulation by fusing RGB-D vision with robot proprioception. This approach serves as a scalable pseudo-tactile signal, avoiding the cost and fragility associated with dedicated hardware tactile sensors.

  • The model uses a transformer to fuse visual data and proprioceptive information for binary contact estimation.
  • A single contact prediction model is trained across multiple objects to support downstream reinforcement learning agents.
  • The inferred contact signal enables in-hand object reorientation and generalizes effectively to novel objects.
  • Experiments conducted in both simulation and on a real-world robot validate the feasibility of this vision-based approach.

This method demonstrates that robots can reliably infer physical contact through embodied perception, offering a practical alternative to hardware-dependent tactile sensing for dexterous manipulation tasks.