Large language models can generate median-quality legal text, but no benchmark evaluates their ability to perform doctrinal legal reasoning. This gap undermines the EU AI Act's requirement of 'appropriate accuracy' in judicial AI, as the necessary operational definition lacks a doctrinal-reasoning evaluation standard.
Measurement Gap in EU Law Automation
from English