Graph-PRefLexOR uses graph-native RL to improve traceable hypothesis generation

Researchers developed Graph-PRefLexOR, a family of graph-native reasoning models fine-tuned with Group Relative Policy Optimization (GRPO) to organize reasoning into explicit phases for mechanism exploration and hypothesis synthesis. This design links neural language generation with symbolic relational structure, enabling causal connections to be constructed, inspected, and reused.

On 100 open-ended questions from materials science and mechanics literature, Graph-PRefLexOR achieves 40-65% improvements over corresponding base models, with the largest gains in reasoning traceability.
Embedding analyses show broader semantic exploration and approximately 2-3 times greater semantic diversity than baselines.
Semantic backtracking and layer-wise hidden-state analyses further show stronger alignment between structured reasoning and final answers.
Test-time graph expansion reveals that additional compute primarily increases long-range conceptual recombination within a bounded semantic space, rather than simply expanding semantic coverage.

These results establish graph-native reinforcement learning as a pathway toward interpretable AI systems for scientific hypothesis generation in materials design and other scientific applications.