Story Operators: Decomposing the Original to Sequel Transformation in Embedding Space
This study models literary transformations as geometric operations within a sentence-embedding space using all-mpnet-base-v2 vectors from the PG19 corpus. By calculating displacement vectors between original novels and their sequels, the author decomposes these changes along a content basis derived via PCA. Analysis of thirteen verified author pairs reveals a taxonomy of sequel types: formulaic, concentrated, and compositional. Formulaic transformations involve minimal rank changes, such as Doyle's Holmes collections with a norm of 0.12. Concentrated shifts are dominated by a single axis, exemplified by Alcott's Little Women to Little Men where 75% of the change occurs on one move. Compositional transformations involve many small axes, seen in works by Twain, Burroughs, and Nesbit. For Tom Sawyer to Huckleberry Finn, the dominant axis is structural, reflecting a shift from domesticity to picaresque adventure rather than surface themes like vernacular voice. The geometric findings are corroborated against Mark Twain's documented authorial intent in letters to Howells.