A new attention mechanism, Distance-Adaptive Representation (DAR), assigns richer representations to nearby tokens and reduced dimensions to distant ones. This approach matches full-dimensional performance across multiple model scales and fine-tuning, outperforming uniform dimensionality reduction.
Distance-Adaptive Representation for Attention
from English