Researchers have introduced SVD-Surgeon, a training-free method that applies the Optimal Brain Surgeon framework to singular-value decomposition for compressing large language models. This approach computes closed-form updates for retained singular values to compensate for truncation errors and determines which values to prune based on saliency.

  • The method treats each singular value as a parameter to compute second-order loss compensation for removed values.
  • It generates a saliency metric to identify which singular values should be pruned.
  • SVD-Surgeon operates directly on the singular-value factorization, allowing it to layer on top of existing SVD compressors.
  • When applied to SVD-LLM, it improves the perplexity-compression trade-off on the OPT family and LLaMA 2-7B without requiring retraining.

This technique enhances compression efficiency by optimizing singular values directly, offering a way to improve model performance metrics without the computational cost of retraining.