LLMs Benchmarked for Web Vulnerability Detection
A study evaluates six LLMs on detecting real-world web vulnerabilities in WordPress plugins, finding detection rates vary by model and prompt design. Claude Opus 4.6 achieved the highest detection rate at 63%, while Qwen 3.5 only reached 35%, and no model consistently identified all baseline vulnerabilities across iterations.