DICE Improves Long-Document Retrieval with Chunk Evidence Aggregation
DICE, a training-free method, splits long documents into chunks, encodes them independently, and aggregates the results into a single vector. It reduces the Evidence Dilution Index in 92.8% of cases on LongEmbed, significantly improving retrieval performance for slices over 4k tokens across four backbones.