This paper introduces Poller (Poetry LLM Evaluator), a novel method that leverages large language models to evaluate poetry understanding by emulating human judgment through role-playing. The approach requires LLMs to adopt the perspective of the poem's author, using detailed information to bridge the gap between automated efficiency and human expertise.

  • Poller reduces evaluation error between LLMs and humans by having models play the role of the poem's author.
  • The method evaluates poem interpretations across eight specialized dimensions.
  • For rhetorical techniques, Poller-based LLMs achieve a 94.55% error reduction compared to baseline methods.
  • For defamiliarization, the method achieves an 89.53% error reduction over conventional evaluation approaches.

This work establishes a foundation for automated evaluation in poetry-related tasks by effectively combining the efficiency of LLMs with the nuance of human expertise.