Researchers introduce layer-specific positional embedding scaling (LPES) to address the "lost-in-the-middle" problem in large language models, where critical information in long-context inputs is often underrepresented. This method assigns distinct scaling factors to each transformer layer to achieve a more balanced attention distribution without requiring parameter fine-tuning or increasing inference delay.
- LPES utilizes a genetic algorithm incorporating Bézier curves to efficiently select optimal scaling factors for each layer while significantly reducing the search space.
- The approach avoids high latency and suboptimal hand-crafted scaling strategies associated with existing multi-scale rotary position embedding methods.
- Extensive experiments show consistent improvements across multiple long-context benchmarks, yielding up to an 11.2% accuracy gain on key-value retrieval datasets.
This technique effectively mitigates positional attention bias in transformers, offering a scalable solution for improving information retention in long-context scenarios without compromising inference speed.