Theoria verifies informal reasoning by auditing typed state transitions

Theoria is a verification architecture designed to bridge the gap between formal proof assistants and scalar LLM judges by making AI answers auditable. It rewrites candidate solutions into a sequence of typed state transitions, each licensed by an explicit justification such as a citation or computation.

The system enforces completeness of change, ensuring every difference between consecutive proof states is accounted for to surface hidden premises.
On HLE-Verified Gold, Theoria certifies 105 out of 185 problems with 91.4% strict precision.
It produces human-readable proof traces where each step can be independently challenged.
On GPQA Diamond, certified precision reaches 97.1%.

The approach allows users to verify correctness through structured analysis rather than opaque scores, offering a complementary method to holistic LLM judges.

Benchmark	Model	Score
GPQA Diamond	Theoria	97.1%
Humanity's Last Exam	Theoria	91.4%

Benchmarks