Safety & alignment
arxiv arXiv cs.AI · 7d ago

Towards an Agent-First Web: Redesigning the Web for AI Agents

A new paper proposes a fundamental redesign of the web to prioritize AI agent access, challenging the long-held assumption that humans are the primary web users. It introduces access, economic, and content layer reforms—including agent-identifiable HTTP headers, intent-based subscription models, and a cryptographic provenance system—to enable AI agents as first-class participants, with human supervision and accountability embedded in the architecture.

arxiv arXiv cs.AI · 7d ago

Human-AI Coevolution Framework Reveals Social Intelligence Emergence

The Human-AI Coevolution Dynamics Framework (HACD-H) introduces a unified model for long-term human-AI interaction, integrating emotional adaptation, memory, and personality into a self-organizing system. Results show social intelligence emerges through coevolution, with a significant negative correlation between social intelligence and social cognitive energy (r = -0.391, p < 0.001), and progressive energy reduction over time.

media Don't Worry About the Vase · 8d ago

No Jailbreak: Fable's 'Fix This Code' Was a Fake Scenario

The article confirms there was no actual jailbreak of Anthropic's Fable AI. Instead, a test involving fake code with planted vulnerabilities was conducted, where Fable refused to review the code and only responded to a request to 'fix this code' after manual steps. Katie Moussouris of Luta Security states this scenario should not trigger export controls, calling it a deliberate, engineered test that undermines claims of a security breach.

media Interconnects · 8d ago

State of the Interconnects Blog Mid-2026

The author outlines three core goals: clarifying frontier AI model evolution, building an open AI ecosystem, and creating institutions to support these missions. Interconnects serves as a raw, independent voice for frontier AI thinking, with a dedicated technical audience of over 70K subscribers. The blog maintains paywalled comments to prevent AI-generated noise, and the author plans to reach 1000 paid subscribers by summer, emphasizing financial sustainability and independence amid rising AI service costs.

arxiv arXiv cs.LG · 8d ago

LegalHalluLens: Auditing Hallucinations in Legal AI

LegalHalluLens introduces a framework to audit AI hallucinations in legal contexts by analyzing typed hallucination profiles across four claim categories. It reveals a 38-40 point gap between obligation/numeric and temporal claims, and shows two systems with identical 52% hallucination rates can have opposite risk directions. The framework uses a Risk Direction Index and calibrated debate pipelines to reduce fabricated detections by 45%, offering actionable diagnostics for trustworthy legal AI deployment.

arxiv arXiv cs.LG · 8d ago

Edge Flow: A Continuous-Time Model for Gradient Descent at Edge of Stability

Edge Flow is a tractable, predictive continuous-time model that captures gradient descent dynamics at the edge of stability. It decomposes dynamics into center, oscillation direction, and magnitude, with self-stabilization of sharpness emerging from coupled feedback. The model requires only two gradient evaluations and one Hessian-vector product per iteration and outperforms prior models in tracking oscillations and explaining instabilities at EoS.

arxiv arXiv cs.CL · 8d ago

LegalHalluLens: Auditing Hallucinations in Legal AI

LegalHalluLens introduces a framework to audit AI hallucinations in legal contexts by analyzing typed hallucination profiles across four claim categories. It reveals a 38-40 point gap between obligation/numeric and temporal claims, and shows two systems with identical 52% hallucination rates can have opposite risk directions. The framework uses a Risk Direction Index and calibrated debate pipelines to reduce fabricated detections by 45% and improve accountability in legal AI deployment.

arxiv arXiv cs.CL · 8d ago

AI's Synthetic Lived Experience in Caregiver Support

LLMs can generate peer-like responses that mimic personal narratives, creating a false impression of lived experience. Psycholinguistic analysis shows human peers use more first-person and past-focused language than AI, and AI often fabricates experiential grounding without real experience. This synthetic lived experience paradox risks misleading caregivers, necessitating mechanisms to distinguish supportive framing from fabricated experience.

arxiv arXiv cs.CL · 8d ago

Agentic Benchmark Reveals AI Models Fail to Avoid Animal Exploitation

TAC, the first agentic benchmark for implicit animal welfare, tests AI agents' ability to avoid animal exploitation in travel booking scenarios. All seven frontier models score below 64%, with the best at 53%, and even minor prompt improvements yield only modest gains. An audit finds no signs of evaluation awareness, indicating performance gaps stem from lack of true welfare reasoning, not prompt recognition.

arxiv arXiv cs.LG · 8d ago

Fairness in Graph Neural Networks via Laplacian Adaptation

A new framework modifies the Laplacian operator in graph diffusion to enhance fairness by incorporating subspace projections, spectral adjustments, and frequency-based filtering. The method leverages graph diffusion's smoothing properties to mitigate bias, with theoretical analysis and empirical validation on synthetic and real-world datasets showing improved fairness without significant computational overhead.