arxiv arXiv cs.AI · 6d ago · research

Learnable Global Merging for Variable-Length Tokenization in Diffusion Transformers

from English

A novel variable-length tokenizer uses learnable global merging to enable cross-length representation alignment in diffusion models. This data-independent approach overcomes position-dependent semantics and improves the quality-compute trade-off on ImageNet 256×25-6 generation compared to prior methods.

Importance 2/3 New feature vs. leaders arXiv cs.AI Allen AI Hugging Face OpenAI Evaluation & benchmarks Image generation Training methods

Read original