Rescaling MLM-Head for Neural Sparse Retrieval

A study finds that large MLM-head norms in pretrained encoders degrade sparse retrieval performance in SPLADE. Introducing a simple initialization-time rescaling of the MLM-head stabilizes training and improves performance, matching or exceeding BERT-SPLADE in multiple benchmarks.