This study evaluates Word2Vec's ability to capture semantic relationships in Toki Pona, a language with only 130 words. Using 1.4 million sentences, it finds that non-core tokens do not disrupt embedding structure and may actually bring similar words closer in vector space. The results show Word2Vec's effectiveness relies more on distributional patterns than vocabulary size, even at extreme lexical reduction.
Word2Vec's Performance in Toki Pona's Minimal Vocabulary
from English