Researchers introduce layer-specific positional embedding scaling (LPES) to address the "lost-in-the-middle" problem in large language models, where critical information in long-context inputs is often underrepresented. This method assigns distinct scaling factors to each transformer layer to achieve a more balanced attention distribution without requiring parameter fine-tuning or increasing inference delay.

  • LPES utilizes a genetic algorithm incorporating Bézier curves to efficiently select optimal scaling factors for each layer while significantly reducing the search space.
  • The approach avoids high latency and suboptimal hand-crafted scaling strategies associated with existing multi-scale rotary position embedding methods.
  • Extensive experiments show consistent improvements across multiple long-context benchmarks, yielding up to an 11.2% accuracy gain on key-value retrieval datasets.

This technique effectively mitigates positional attention bias in transformers, offering a scalable solution for improving information retention in long-context scenarios without compromising inference speed.