Safety & alignment
arxiv arXiv cs.CL · 7d ago

LLM-based Metrics Improve Clinical Significance Evaluation in Radiology

A study introduces lightweight, interpretable metrics that sharpen the boundary between clinically significant errors and harmless variations in radiology reports. These metrics outperform large medical LLMs and rival proprietary models, with one-pass training proven effective for cost-sensitive deployment. The two-pass setting fails to consistently improve performance and shifts focus from error detection to robustness.

arxiv arXiv cs.CL · 7d ago

Index Sickness Elimination via Baseline-Log Physical Separation

In a 391-session AI collaboration project, LLMs exhibited 'Index Sickness'—a failure where symbolic complexity leads to self-referential outputs disconnected from reality. The 'Pang Principle' asserts natural language conveys superior semantic quality over symbolic systems, and the 'Baseline-Log Physical Separation' mechanism reduced AI instruction volume by 75% and eliminated recurrence of Index Sickness in subsequent sessions.

arxiv arXiv cs.CL · 7d ago

Human-AI Coevolution Framework Reveals Social Intelligence Emergence

The Human-AI Coevolution Dynamics Framework (HACD-H) introduces a unified model for long-term human-AI interaction, integrating emotional adaptation, memory, and personality into a self-organizing social cognitive system. Results show social intelligence emerges through coevolution, with a significant negative correlation between social intelligence and social cognitive energy (r = -0.391, p < 0.001), and progressive energy reduction over time in interaction trajectories.

arxiv arXiv cs.AI · 7d ago

Towards an Agent-First Web: Redesigning the Web for AI Agents

A new paper proposes a fundamental redesign of the web to prioritize AI agent access, challenging the long-held assumption that humans are the primary web users. It introduces access, economic, and content layer reforms—including agent-identifiable HTTP headers, intent-based subscription models, and a cryptographic provenance system—to enable AI agents as first-class participants, with human supervision and accountability embedded in the architecture.

arxiv arXiv cs.AI · 7d ago

Human-AI Coevolution Framework Reveals Social Intelligence Emergence

The Human-AI Coevolution Dynamics Framework (HACD-H) introduces a unified model for long-term human-AI interaction, integrating emotional adaptation, memory, and personality into a self-organizing system. Results show social intelligence emerges through coevolution, with a significant negative correlation between social intelligence and social cognitive energy (r = -0.391, p < 0.001), and progressive energy reduction over time.

media Don't Worry About the Vase · 7d ago

No Jailbreak: Fable's 'Fix This Code' Was a Fake Scenario

The article confirms there was no actual jailbreak of Anthropic's Fable AI. Instead, a test involving fake code with planted vulnerabilities was conducted, where Fable refused to review the code and only responded to a request to 'fix this code' after manual steps. Katie Moussouris of Luta Security states this scenario should not trigger export controls, calling it a deliberate, engineered test that undermines claims of a security breach.

media Interconnects · 8d ago

State of the Interconnects Blog Mid-2026

The author outlines three core goals: clarifying frontier AI model evolution, building an open AI ecosystem, and creating institutions to support these missions. Interconnects serves as a raw, independent voice for frontier AI thinking, with a dedicated technical audience of over 70K subscribers. The blog maintains paywalled comments to prevent AI-generated noise, and the author plans to reach 1000 paid subscribers by summer, emphasizing financial sustainability and independence amid rising AI service costs.

arxiv arXiv cs.LG · 8d ago

LegalHalluLens: Auditing Hallucinations in Legal AI

LegalHalluLens introduces a framework to audit AI hallucinations in legal contexts by analyzing typed hallucination profiles across four claim categories. It reveals a 38-40 point gap between obligation/numeric and temporal claims, and shows two systems with identical 52% hallucination rates can have opposite risk directions. The framework uses a Risk Direction Index and calibrated debate pipelines to reduce fabricated detections by 45%, offering actionable diagnostics for trustworthy legal AI deployment.