A new theory models how semantic paraphrases can fool financial sentiment classifiers by analyzing the worst-case displacement of target model representations. The attackability index λ*(x) is derived from the largest generalised eigenvalue of a matrix pencil (A,B), offering closed-form predictions and robustness certificates for affine readouts. The framework connects continuous perturbation theory to discrete paraphrase search, with empirical validation on real financial text classifiers.
Generalised Eigenvalue Geometry of Semantic Adversarial Attacks
from English