The authors introduce SocialPersona, a benchmark designed to evaluate whether multimodal large language models (MLLMs) can recover revealed preferences from longitudinal social-media timelines and use them in dialogue. This work addresses the limitation of current evaluations that focus only on explicit memory by testing a model's ability to infer interests from natural multimodal traces.
- Built from longitudinal timelines of 171 everyday, non-promotional social-media users.
- Contains text, images, timestamps, and 2,597 human-verified preference tags across seven interest domains.
- Separates stable interests from recent interests to test temporal reasoning.
- Supports two tasks: constructing structured user profiles and generating responses aligned with inferred profiles.
- Experiments show models identify broad domains well but struggle with fine-grained, recent interests, and dialogue personalization.
The results indicate that robust cross-modal, long-horizon user modeling remains a key challenge, and SocialPersona can help measure progress toward assistants that infer and act on revealed preferences.