Energy-Based Transformers Predict Reading Difficulty

Energy-based transformers show robust predictive power for reading times across multiple corpora, outperforming surprisal in all cases. The energy measure captures known object/subject asymmetries in relative clause processing and subsumes both attention entropy and surprisal, suggesting it as a unified predictor of reading difficulty.