Safety & alignment
arxiv arXiv cs.AI · 6d ago

Sovereign Execution Broker for Certificate-Bound Agentic Control

The Sovereign Execution Broker (SEB) introduces a runtime enforcement boundary that verifies and executes certified authority in agentic systems. It validates execution contracts, checks validity periods, and ensures policy compliance before invoking infrastructure APIs, providing a short-lived, auditable, and revocable execution capability. The prototype was evaluated on AWS and Kubernetes, measuring latency, revocation propagation, and fault injection resistance.

arxiv arXiv cs.LG · 6d ago

Lightweight Defense Against False Data Injection in Power Grids

A new defense framework enhances deep neural networks' resilience to false data injection attacks in power grids by adding a padding layer with pseudofeatures derived from input statistical distributions. This lightweight, model-agnostic approach increases input dimensionality in a randomized, data-aware way, making adversarial perturbations non-transferable and unpredictable, thus effectively countering attacks without performance degradation.

arxiv arXiv cs.LG · 6d ago

Riemannian Sharpness Explains SGD's Bias Toward Flat Minima

This study introduces Riemannian sharpness, a reparametrization-invariant measure of flatness grounded in Fisher Information Matrix geometry. It proves SGD's stationary distribution concentrates at Riemannian-flat minima and links this geometric bias to generalization via a PAC-Bayes bound. Experiments on MNIST and CIFAR-10 show Riemannian sharpness better tracks generalization than Euclidean sharpness, with scaling consistent with theory.

arxiv arXiv cs.LG · 6d ago

How Safety-Aligned LLMs Interpret Mixed Compliance Demonstrations

A study finds benign and harmful compliance demonstrations are not interchangeable in language models. Benign demonstrations can either reduce or increase harmful compliance depending on the model, with preference optimization playing a key role in preventing harmful compliance. The research also reveals recency bias in demonstration ordering and varied model behaviors in handling refusals during in-context learning.

arxiv arXiv cs.LG · 6d ago

Sovereign Execution Broker for Certificate-Bound Agentic Control

The Sovereign Execution Broker (SEB) introduces a runtime enforcement boundary that verifies and executes certified authority in agentic systems. It ensures production mutation authority is isolated from non-deterministic reasoning by validating execution contracts, validity windows, and revocation states before invoking infrastructure APIs. The prototype demonstrates secure, auditable execution on AWS and Kubernetes with measurable latency and fault resilience.

arxiv arXiv cs.CL · 6d ago

StylisticBias: Visual Cues Drive Most Social Biases in MLLMs

StylisticBias introduces a controlled benchmark to evaluate attribute-level social bias in multimodal large language models. It reveals that age and body type dominate identity-level effects, while fashion style and 15 key visual attributes drive most bias, accounting for nearly 80% of variation. The benchmark highlights that model judgments are most sensitive to appearance-related cues, especially in socioeconomic and style-based contexts.

arxiv arXiv cs.AI · 6d ago

LLM Psychological Profiles Are Measurement Artifacts

A formal psychometric analysis shows that apparent psychological profiles of large language models are primarily driven by response bias, not actual traits. This bias, which causes models to consistently favor one end of a scale, accounts for 81-90% of between-model variation, far exceeding human differences. The study concludes that these profiles are artifacts of instrument design and not true model properties, urging the development of assessments based on response orthogonality.