MultiHashFormer: Hash-based Generative Language Models
The paper introduces MultiHashFormer, a framework enabling hash-based autoregression in causal language models by representing tokens as unique signatures of discrete hash IDs. This approach allows the model to compress token information into latent vectors for Transformer processing while mapping them back to text, effectively addressing the many-to-one collision issues that previously prevented hashing in generative contexts.