This paper investigates the use of Large Language Models (LLMs) for data fusion tasks involving tabular data, covering both single-truth and multi-truth scenarios. The study evaluates various prompting strategies across three benchmark datasets to determine their effectiveness in resolving conflicting values from multiple sources.

  • Domain-dependent, domain-independent, zero-shot, and one-shot prompts are empirically evaluated on three different benchmark datasets.
  • LLM-based approaches outperform traditional unsupervised truth discovery methods, specifically DART and LTM, across all tested datasets.
  • The codebase for this study has been made publicly available on GitHub.

The authors consider this important as it demonstrates that LLMs can effectively handle data integration problems where multiple sources provide potentially conflicting information.