TailorMind: Towards Preference-Aligned Multimodal Content Generation
The authors introduce TailorMind, a system for personalized multimodal content generation that creates user-tailored outputs without relying on existing item pools or waiting for matching user-generated content. The approach links collaborative preference modeling with controllable multimodal generation by enriching sparse user histories through hypergraph collaborative filtering. It further optimizes textual profiles using ranking-error feedback and textual gradient descent to better capture user preferences. To ensure quality, the system employs retrieval-augmented style control grounded in authentic patterns and cross-modal cohesion reflection to reduce semantic drift. The researchers also present TailorBench, a benchmark evaluated across five dimensions including coherence, novelty, aesthetic quality, hallucination, and profiling. Experiments demonstrate that TailorMind achieves competitive or stronger coherence compared to baselines while improving novelty and aesthetic quality over representative generation models and ground-truth data. Additionally, the system shows advantages over retrieving available content and achieves up to 29% Recall gains in reranking tasks.