SocialPersona: Benchmarking Personalized Profiling and Response with Multimodal Social-Media Context

The authors introduce SocialPersona, a benchmark designed to evaluate whether multimodal large language models (MLLMs) can recover revealed preferences from longitudinal social-media timelines and use them in dialogue. This work addresses the limitation of current evaluations that focus only on explicit memory by testing a model's ability to infer interests from natural multimodal traces.

Built from longitudinal timelines of 171 everyday, non-promotional social-media users.
Contains text, images, timestamps, and 2,597 human-verified preference tags across seven interest domains.
Separates stable interests from recent interests to test temporal reasoning.
Supports two tasks: constructing structured user profiles and generating responses aligned with inferred profiles.
Experiments show models identify broad domains well but struggle with fine-grained, recent interests, and dialogue personalization.

The results indicate that robust cross-modal, long-horizon user modeling remains a key challenge, and SocialPersona can help measure progress toward assistants that infer and act on revealed preferences.