Beyond the Mean: Three-Axis Fidelity for Aligning LLM-Based Survey Simulators from Small Pilot Data

This study investigates whether large language models can recover the statistical characteristics of a broader population using only a small pilot sample of human responses. The authors decompose this recovery into three axes: structural fidelity, marginal fidelity, and individual fidelity.

The research benchmarks prompting, rectification, and fine-tuning approaches using a COVID-19 misinformation survey as a case study.
Findings indicate that fine-tuning on small pilot samples provides a balanced approach for achieving multiple forms of fidelity.
The levels of fidelity achieved through fine-tuning can vary across subsamples, which may threaten pluralistic alignment.