Reverse-Engineering Transformer Attention with Executable Programs

A new method uses program synthesis to generate Python programs that reproduce attention patterns in transformer models. These programs achieve over 75% average Intersection-over-Union similarity on held-out data and can replace up to 25% of attention heads with minimal impact on model performance, increasing perplexity by only 16% on average.