Safety & alignment
arxiv arXiv cs.CL · 7d ago

Index Sickness Elimination via Baseline-Log Physical Separation

In a 391-session AI collaboration project, LLMs exhibited 'Index Sickness'—a failure where symbolic complexity leads to self-referential outputs disconnected from reality. The 'Pang Principle' asserts natural language conveys superior semantic quality over symbolic systems, and the 'Baseline-Log Physical Separation' mechanism reduced AI instruction volume by 75% and eliminated recurrence of Index Sickness in subsequent sessions.

arxiv arXiv cs.CL · 7d ago

Human-AI Coevolution Framework Reveals Social Intelligence Emergence

The Human-AI Coevolution Dynamics Framework (HACD-H) introduces a unified model for long-term human-AI interaction, integrating emotional adaptation, memory, and personality into a self-organizing social cognitive system. Results show social intelligence emerges through coevolution, with a significant negative correlation between social intelligence and social cognitive energy (r = -0.391, p < 0.001), and progressive energy reduction over time in interaction trajectories.

arxiv arXiv cs.AI · 7d ago

Towards an Agent-First Web: Redesigning the Web for AI Agents

A new paper proposes a fundamental redesign of the web to prioritize AI agent access, challenging the long-held assumption that humans are the primary web users. It introduces access, economic, and content layer reforms—including agent-identifiable HTTP headers, intent-based subscription models, and a cryptographic provenance system—to enable AI agents as first-class participants, with human supervision and accountability embedded in the architecture.

arxiv arXiv cs.AI · 7d ago

Human-AI Coevolution Framework Reveals Social Intelligence Emergence

The Human-AI Coevolution Dynamics Framework (HACD-H) introduces a unified model for long-term human-AI interaction, integrating emotional adaptation, memory, and personality into a self-organizing system. Results show social intelligence emerges through coevolution, with a significant negative correlation between social intelligence and social cognitive energy (r = -0.391, p < 0.001), and progressive energy reduction over time.

media Don't Worry About the Vase · 7d ago

No Jailbreak: Fable's 'Fix This Code' Was a Fake Scenario

The article confirms there was no actual jailbreak of Anthropic's Fable AI. Instead, a test involving fake code with planted vulnerabilities was conducted, where Fable refused to review the code and only responded to a request to 'fix this code' after manual steps. Katie Moussouris of Luta Security states this scenario should not trigger export controls, calling it a deliberate, engineered test that undermines claims of a security breach.

media Interconnects · 7d ago

State of the Interconnects Blog Mid-2026

The author outlines three core goals: clarifying frontier AI model evolution, building an open AI ecosystem, and creating institutions to support these missions. Interconnects serves as a raw, independent voice for frontier AI thinking, with a dedicated technical audience of over 70K subscribers. The blog maintains paywalled comments to prevent AI-generated noise, and the author plans to reach 1000 paid subscribers by summer, emphasizing financial sustainability and independence amid rising AI service costs.

arxiv arXiv cs.LG · 8d ago

LegalHalluLens: Auditing Hallucinations in Legal AI

LegalHalluLens introduces a framework to audit AI hallucinations in legal contexts by analyzing typed hallucination profiles across four claim categories. It reveals a 38-40 point gap between obligation/numeric and temporal claims, and shows two systems with identical 52% hallucination rates can have opposite risk directions. The framework uses a Risk Direction Index and calibrated debate pipelines to reduce fabricated detections by 45%, offering actionable diagnostics for trustworthy legal AI deployment.

arxiv arXiv cs.LG · 8d ago

Edge Flow: A Continuous-Time Model for Gradient Descent at Edge of Stability

Edge Flow is a tractable, predictive continuous-time model that captures gradient descent dynamics at the edge of stability. It decomposes dynamics into center, oscillation direction, and magnitude, with self-stabilization of sharpness emerging from coupled feedback. The model requires only two gradient evaluations and one Hessian-vector product per iteration and outperforms prior models in tracking oscillations and explaining instabilities at EoS.