Lab · Cohere
arxiv arXiv cs.CL · 8d ago

LegalHalluLens: Auditing Hallucinations in Legal AI

LegalHalluLens introduces a framework to audit AI hallucinations in legal contexts by analyzing typed hallucination profiles across four claim categories. It reveals a 38-40 point gap between obligation/numeric and temporal claims, and shows two systems with identical 52% hallucination rates can have opposite risk directions. The framework uses a Risk Direction Index and calibrated debate pipelines to reduce fabricated detections by 45% and improve accountability in legal AI deployment.

media AI News (smol.ai) · 4d ago

GLM-5.2 Breakout and Open-Model Progress Highlighted

Zhipu's GLM-5.2 emerged as the top open-weight model, praised for its frontier-adjacent performance in daily use, with improvements in coding tasks and reduced 1M-token inference cost via IndexShare. It outperformed other open models in agentic knowledge work benchmarks, reaching 1266 Elo in Artificial Analysis' AA-Briefcase test, though only 3% of tasks were fully satisfied by top models, indicating persistent challenges in real-world long-horizon agent performance.

arxiv arXiv cs.LG · 6d ago

Training LLMs for Long-Lifecycle Agents via Cross-Domain Generalization

A new framework enables large language models to develop 'Connect the Dots' capability, allowing long-lifecycle agents to learn from experiences and iteratively update their environment context. The framework uses reinforcement learning with long rollout sequences and custom tasks to promote cross-domain generalization, showing effective out-of-distribution performance in both domains and transition settings.

arxiv arXiv cs.AI · 7d ago

CAPRA: Multi-Agent LLM System for Software Architecture Feedback

CAPRA is a multi-agent LLM system that generates personalized, template-compliant LaTeX feedback on software architecture deliverables. It uses specialized agents, PyMuPDF, and gpt-4o to extract and analyze text and UML diagrams, with evidence anchoring and consistency management to ensure reliability. A preliminary evaluation of 10 student reports shows CAPRA met 88.8% of eight criteria and achieved moderate inter-rater agreement (kappa = 0.582), with each report processed in under 4 minutes.