Computational Stylometry of English Pali Canon Translations Across Pitakas
This study presents a computational stylometric analysis of the Tipitaka across all three Pitakas in English translation, extending previous work on the Sutta Pitaka. The corpus comprises 134,831 segments from Bhikkhu Sujato's Sutta Pitaka, Bhikkhu Brahmali's Vinaya Pitaka, I.B. Horner's 1938 Vinaya translation, three English translations of the Abhidhammattha Sangaha, and cross-tradition Vinaya texts. The authors compute Zipf rank-frequency distributions, MATTR-500 lexical diversity, numeral-word density, and vocabulary overlap metrics. Main findings indicate that all corpora show Zipf-consistent distributions with R-squared values above 0.989. The Sutta and Theravada Vinaya exhibit nearly identical lexical diversity scores of 0.399 and 0.400, while the Sangaha corpus is more diverse at 0.560. The Sangaha corpus also displays the highest numeral-word density at 3.26%, reflecting its systematic enumeration of categories. Additionally, the Mulasarvastivada Vinaya shares significant vocabulary overlap with the Theravada Vinaya, whereas two English translations of the same source share only 24.2% of their vocabulary.