Lab · xAI
arxiv arXiv cs.LG · 8d ago

LegalHalluLens: Auditing Hallucinations in Legal AI

LegalHalluLens introduces a framework to audit AI hallucinations in legal contexts by analyzing typed hallucination profiles across four claim categories. It reveals a 38-40 point gap between obligation/numeric and temporal claims, and shows two systems with identical 52% hallucination rates can have opposite risk directions. The framework uses a Risk Direction Index and calibrated debate pipelines to reduce fabricated detections by 45%, offering actionable diagnostics for trustworthy legal AI deployment.

media Latent Space · 6d ago

Why AI Scaling Is a Systems Problem, Not Just a GPU Race

The AI scaling debate overlooks that maximizing model FLOP utilization is more critical than buying more GPUs. Frontiers like xAI operate at sub-10% MFU, while historical models achieved 21% to 70% MFU, indicating systemic inefficiencies in scheduling, networking, and cluster management. Anjney Midha argues that AI infrastructure must evolve into efficient, aligned, and responsible systems, with 'output maxing' emerging as a new discipline for frontier AI.