Critical Percolation as a Synthetic Data Model for Interpretability

A new synthetic dataset based on critical mean-field percolation clusters provides a realistic, analytically tractable model with hierarchical structure. It features sparse, fractal clusters with power-law size distributions and latent variables that generate target values via a taxonomic hierarchy. Neural networks can linearly decode these ground-truth latent variables from activations, demonstrating strong interpretability.