Data Selection Through Iterative Self-Filtering for Vision-Language Settings
Researchers propose a novel bootstrapped method called Self-Filtering that trains a CLIP model on an evolving dataset selected through iterative self-filtering. This approach balances filtered, high-probability clean samples with diverse examples from the entire distribution to mitigate noise in large-scale vision-language datasets.