LangMAP extends UnigramLM to create language-specific tokenization from a shared vocabulary, enabling multilingual model training or adaptation without vocabulary changes. It improves morphological boundary alignment and AST leaf alignment in coding languages, and enhances grammatical acceptability in target languages, though benefits vary on knowledge-based tasks.