Output-Space Allocation Costs for Calibration-Guided LLM Compression: An Empirical Study

This study investigates whether aligning allocation costs with output-space objectives improves the fidelity of compressed large language models, specifically testing a modification to the ROCKET compression method. The authors compare using weight-space Frobenius error against an output reconstruction objective for multi-choice knapsack problem allocation.

On Qwen3-8B at 50% compression, the proposed ROCKET-ActCost achieved +0.8 percentage points higher average accuracy across 8 zero-shot benchmarks (53.1% vs 52.3%).
The same configuration increased WikiText perplexity by 16%, rising from 52.98 to 61.46.
A high correlation (>0.99) between weight-space and output-space errors limits allocation divergence, explaining the modest effect size.
On Llama-3.2-1B at 20% compression, both methods produced near-identical results (53.3% vs 53.5% accuracy).

The findings reveal that different allocation objectives favor different downstream metrics, indicating a tradeoff between accuracy and perplexity. The study suggests that the choice of cost function has minor effects on model performance at lower compression ratios.