Poller: Are LLMs Suitable for Evaluating the Poetry Understanding Task?

This paper introduces Poller (Poetry LLM Evaluator), a novel method that leverages large language models to evaluate poetry understanding by emulating human judgment through role-playing. The approach requires LLMs to adopt the perspective of the poem's author, using detailed information to bridge the gap between automated efficiency and human expertise.

Poller reduces evaluation error between LLMs and humans by having models play the role of the poem's author.
The method evaluates poem interpretations across eight specialized dimensions.
For rhetorical techniques, Poller-based LLMs achieve a 94.55% error reduction compared to baseline methods.
For defamiliarization, the method achieves an 89.53% error reduction over conventional evaluation approaches.

This work establishes a foundation for automated evaluation in poetry-related tasks by effectively combining the efficiency of LLMs with the nuance of human expertise.