The Curse of Multiple Mediators: Hidden Interaction Effects in Activation Patching
A re-derivation of the activation patching estimand from causal mediation analysis reveals that the natural indirect effect (NIE) captures not only the causal effect through a specific component but also interaction effects (INT). These INT terms measure how much a component's causal effect depends on the state of other components in the model, challenging the assumption that NIE isolates individual contributions.