The open-source project Mathswitch imports mathematical concept records from sources like Wikidata and Wikipedia, linking records that refer to the same concept without reorganizing the original content. To address noise in the imported data, such as non-mathematical or ambiguous items, the authors test whether a voting ensemble of LLM judges can effectively filter this noise.

  • Mathswitch imports records from Wikidata, Wikipedia, MathWorld, Encyclopedia of Mathematics, nLab, ProofWiki, and Agda-Unimath.
  • The project links records referring to the same concept while preserving each source's original structure.
  • The study evaluates LLM voting ensembles on Wikidata items with known MathWorld identifiers as a positive control.
  • Researchers examined classification changes when database identifiers were removed from the context.
  • Disagreements between judges and MathWorld were grouped into three categories: degenerate descriptions, narrow scope bias, and editorial-scope mismatches.

The findings suggest distinct remediation strategies for different types of data noise, helping to improve the accuracy of mathematical concept categorization in collaborative knowledge graphs.