The authors propose neural classification trees (NCT), a framework that achieves robustness by encoding subgroup structure within its tree-shaped architecture to address spurious correlations in machine learning models.
- NCT routes each sample to an "easy" or "hard" node based on prediction correctness and reuses these routes as pseudo-labels for subsequent iterations.
- The method disentangles conflicting subgroups without requiring explicit subgroup supervision.
- Experiments on five benchmarks demonstrate that the learned tree topology consistently isolates minority subgroups, providing strong interpretability.
- The approach yields competitive robustness compared to state-of-the-art methods while offering a transparent mapping between model architecture and data structure.