Researchers have introduced SVD-Surgeon, a training-free method that applies the Optimal Brain Surgeon framework to singular-value decomposition for compressing large language models. This approach computes closed-form updates for retained singular values to compensate for truncation errors and determines which values to prune based on saliency.
- The method treats each singular value as a parameter to compute second-order loss compensation for removed values.
- It generates a saliency metric to identify which singular values should be pruned.
- SVD-Surgeon operates directly on the singular-value factorization, allowing it to layer on top of existing SVD compressors.
- When applied to SVD-LLM, it improves the perplexity-compression trade-off on the OPT family and LLaMA 2-7B without requiring retraining.
This technique enhances compression efficiency by optimizing singular values directly, offering a way to improve model performance metrics without the computational cost of retraining.