A Reddit user argues that Kullback-Leibler divergence (KL) is a flawed metric for measuring the difference between an abliterated model and its base version. The author notes that KL can be represented in many ways, depends entirely on evaluation prompts, and is often manipulated via first-token KL to make models appear superior.
- KL is criticized for being representable in multiple ways.
- Metric results depend completely on the evaluation prompts used.
- First token KL is frequently used to artificially inflate model performance comparisons.
The author seeks community feedback on whether this assessment is accurate and asks for recommendations on better methods to measure differences between abliterated and base models.