Positive-Unlabeled Learning for LLM Evaluation Auditing

A new framework uses positive-unlabeled learning and Partial Optimal Transport to audit LLM evaluation biases. It aligns human-verified positive outputs with unlabelled model responses in embedding space, identifying consistent human preferences and correcting verbosity bias without retraining. Experiments show improved human alignment, robustness to presentation biases, and interpretable confidence estimates.