STAITUS: Disentangling Appearance and Pose for Video Object Tracking
The article introduces STAITUS, a unified framework for unsupervised video object tracking that addresses the limitations of existing slot-based representations by explicitly disentangling appearance from geometric pose. By applying temporal alignment only in appearance space and enforcing spatial separation within frames, the method prevents slots from locking onto static backgrounds during motion.