PeerCheck analyzes differences between LLM and human academic reviews, finding LLMs focus on theory while humans prioritize methodology and experiments. The framework uses prompt engineering like Chain-of-Thought and retrieval-augmented generation, with CoT significantly improving review quality, though RAG introduces an unexpected 'paradox' that sometimes reduces quality.
PeerCheck: Improving LLM-Generated Academic Reviews
from English