Researchers introduce the Dynamic Agent-based Interaction Network (DAIN), a framework that reconceptualizes multimodal fusion as a dynamic, multi-agent collaborative process rather than relying on static architectures. DAIN utilizes a context-aware Meta-Controller to dynamically schedule sparse activation of specialized agents and orchestrates compressed communication for consensus-building.

  • Employs a multi-objective loss function to jointly optimize task accuracy, agent specialization, and operational efficiency via sparse activation and communication regularization.
  • Achieves state-of-the-art performance across five benchmarks (ADNI, MIMIC-IV, MM-IMDB, CMU-MOSI, ENRICO), including a 2.6% accuracy gain on ADNI.
  • Enhances interpretability by exposing context-dependent agent roles and collaboration patterns while maintaining computational efficiency through sample-wise sparse activation.

The work demonstrates the effectiveness of dynamic, agent-based paradigms for multimodal reasoning, offering improved performance and interpretability compared to traditional static Mixture-of-Experts approaches.