Natural Identifiers for Privacy and Data Audits in Large Language Models

This work introduces natural identifiers (NIDs), which are structured random strings like cryptographic hashes and shortened URLs found in LLM training data, to address the challenges of auditing large language model privacy. NIDs enable scalable, post-hoc differential privacy auditing without costly retraining and facilitate dataset inference without requiring private held-out datasets.

Existing privacy auditing methods often require inserting canary data during training or accessing unavailable non-member held-out datasets.
NIDs are naturally occurring structured random strings that allow for the generation of unlimited additional random strings from the same distribution.
These generated strings serve as alternative canaries for audits and as same-distribution held-out data for dataset inference.
The evaluation demonstrates that NIDs enable post-hoc differential privacy auditing without retraining.
Dataset inference is enabled for any suspect dataset containing NIDs without the need for a private non-member held-out dataset.

This approach allows for scalable, post-hoc audits of already-trained models and dataset inference for real-world cases where traditional held-out datasets are difficult to construct.