xAI — korshunov.ai

Lab · xAI

LegalHalluLens: Auditing Hallucinations in Legal AI

LegalHalluLens introduces a framework to audit AI hallucinations in legal contexts by analyzing typed hallucination profiles across four claim categories. It reveals a 38-40 point gap between obligation/numeric and temporal claims, and shows two systems with identical 52% hallucination rates can have opposite risk directions. The framework uses a Risk Direction Index and calibrated debate pipelines to reduce fabricated detections by 45%, offering actionable diagnostics for trustworthy legal AI deployment.

arxiv arXiv cs.AI · 1d ago

LLMs Benchmarked for Web Vulnerability Detection

A study evaluates six LLMs on detecting real-world web vulnerabilities in WordPress plugins, finding detection rates vary by model and prompt design. Claude Opus 4.6 achieved the highest detection rate at 63%, while Qwen 3.5 only reached 35%, and no model consistently identified all baseline vulnerabilities across iterations.

media Latent Space · 6d ago

Why AI Scaling Is a Systems Problem, Not Just a GPU Race

The AI scaling debate overlooks that maximizing model FLOP utilization is more critical than buying more GPUs. Frontiers like xAI operate at sub-10% MFU, while historical models achieved 21% to 70% MFU, indicating systemic inefficiencies in scheduling, networking, and cluster management. Anjney Midha argues that AI infrastructure must evolve into efficient, aligned, and responsible systems, with 'output maxing' emerging as a new discipline for frontier AI.

arxiv arXiv cs.AI · 9d ago

TokenPilot: Cache-Efficient Context Management for LLM Agents

TokenPilot reduces inference costs by 61% to 87% in both isolated and continuous modes, outperforming prior systems in cost efficiency while maintaining competitive performance. It uses ingestion-aware compaction and lifecycle-aware eviction to preserve prompt cache continuity and minimize token footprint without introducing prefix mismatches.

media r/LocalLLaMA · 4d ago

Claude Will Soon Require Identity Verification

Anthropic will soon require users to verify their identity to access Claude. The change is intended to enhance security and ensure responsible use of the platform.