RedactionBench: A Benchmark for Contextual Privacy in AI

RedactionBench introduces a manually annotated benchmark of 200 diverse documents across 11 domains to evaluate privacy-preserving redaction. It features R-Score, a character-level metric that treats semantically similar redactions equally and reduces bias from formatting choices. Human evaluations reveal significant disagreement on contextual redactions (47.7% consensus), highlighting the subjective nature of privacy and motivating the need for standardized, context-aware benchmarks.