PIVOTSBench: Benchmark for Fine-Grained Interpersonal Reasoning in MLLMs

PIVOTSBench is the first benchmark that evaluates multimodal large language models' ability to reason about bidirectional interpersonal relationships using Social-IQ 2.0 and YouTube data. It includes auxiliary tasks to assess visual cue identification and conducts ablation studies on visual modalities and social role information, analyzing how joint and pairwise predictions improve performance on relationship dimensions grounded in psychology research.