AI agents
arxiv arXiv cs.AI · 8d ago

LegalHalluLens: Auditing Hallucinations in Legal AI

LegalHalluLens introduces a framework to audit AI hallucinations in legal contexts by analyzing typed hallucination profiles across four claim categories. It reveals a 38-40 point gap between obligation/numeric and temporal claims, and shows two systems with identical 52% hallucination rates can have opposite risk directions. The framework uses a Risk Direction Index and calibrated debate pipelines to reduce fabricated detections by 45% and improve accountability in legal AI deployment.

arxiv arXiv cs.AI · 8d ago

ProvenanceGuard: Source-Aware Factuality Verification for MCP-Based LLM Agents

ProvenanceGuard introduces a source-aware verifier for MCP-based LLM agents that detects cross-source conflation by routing claims to specific evidence sources and comparing stated attribution with actual source ownership. It achieves block F1 of 0.802 and source accuracy of 0.858 on 260 source-eligible claims, outperforming source-blind baselines, and detects all injected attribution swaps in 50 clinical probes.

arxiv arXiv cs.AI · 8d ago

Agentic AI Framework Reduces Diagnostic Errors in Healthcare

A multi-agent AI framework addresses premature diagnostic handoff and silent hallucinations in healthcare by enforcing structured clinical protocol completion and epistemic uncertainty quantification. Evaluations on 150 simulated cases show 49.3% diagnostic precision, an 11.3 percentage point improvement over baseline, with a statistically significant negative correlation between OLDCARTS completeness and diagnostic uncertainty.

arxiv arXiv cs.AI · 9d ago

Meta-Knowledge Reutilization in Reinforcement Learning

A new framework learns task-level knowledge on a simplified agent and transfers it to heterogeneous agents. It uses Bayesian non-parametric priors and a high-level policy to generate task guidance, with a semantic-magnitude interface and temporal adaptor to align meta-knowledge with embodiment-specific controllers. Experiments show 94.75% to 99.79% reduction in final-step tracking error and comparable performance using 23.8% of the interaction data of state-of-the-art methods.

arxiv arXiv cs.AI · 9d ago

Flash Endurance as Depreciating Capital in Robot Memory

A robot's flash memory endurance is a non-renewable asset that degrades with each write. A wear-aware pricing model introduces a shadow price $η$ to guide memory placement across RAM, NVM, and cloud, with optimal routing depending on the value-write association $χ$. Empirical measurements show $χ$ is positive in long-horizon manipulation, null in short-horizon tasks, and negative in teleoperation, and the endurance budget is binding only on low-end QLC/eMMC memory, where wear-aware control influences routing based on task value without improving performance.

arxiv arXiv cs.AI · 9d ago

IUU+DB: LLM-Driven Database for Illegal Fishing and Supply Chain Crimes

IUU+DB is a large language model-driven system that tracks illegal, unreported, and unregulated fishing, seafood fraud, and labor abuse. It extracts key data elements from diverse documents, classifies relevant incidents, and enables trend analysis to identify geographic and behavioral hotspots. The system supports research, risk assessments, and policy enforcement in fisheries and supply chains.