Measurement Gap in EU Law Automation
Large language models can produce median-quality legal text, but no benchmark evaluates their ability to perform doctrinal legal reasoning. This gap undermines the EU AI Act's requirement of 'appropriate accuracy' in judicial AI, as the necessary doctrinal-reasoning evaluation remains absent.