Theoria is a verification architecture designed to bridge the gap between formal proof assistants and scalar LLM judges by making AI answers auditable. It rewrites candidate solutions into a sequence of typed state transitions, each licensed by an explicit justification such as a citation or computation.

  • The system enforces completeness of change, ensuring every difference between consecutive proof states is accounted for to surface hidden premises.
  • On HLE-Verified Gold, Theoria certifies 105 out of 185 problems with 91.4% strict precision.
  • It produces human-readable proof traces where each step can be independently challenged.
  • On GPQA Diamond, certified precision reaches 97.1%.

The approach allows users to verify correctness through structured analysis rather than opaque scores, offering a complementary method to holistic LLM judges.