DAIN: Dynamic Agent-Based Interaction Network for Efficient and Collaborative Multimodal Reasoning

Researchers introduce the Dynamic Agent-based Interaction Network (DAIN), a framework that reconceptualizes multimodal fusion as a dynamic, multi-agent collaborative process rather than relying on static architectures. DAIN utilizes a context-aware Meta-Controller to dynamically schedule sparse activation of specialized agents and orchestrates compressed communication for consensus-building.

Employs a multi-objective loss function to jointly optimize task accuracy, agent specialization, and operational efficiency via sparse activation and communication regularization.
Achieves state-of-the-art performance across five benchmarks (ADNI, MIMIC-IV, MM-IMDB, CMU-MOSI, ENRICO), including a 2.6% accuracy gain on ADNI.
Enhances interpretability by exposing context-dependent agent roles and collaboration patterns while maintaining computational efficiency through sample-wise sparse activation.

The work demonstrates the effectiveness of dynamic, agent-based paradigms for multimodal reasoning, offering improved performance and interpretability compared to traditional static Mixture-of-Experts approaches.