OpenAI Launches Daybreak Security Tools
OpenAI has introduced Codex Security and GPT-5.5-Cyber as part of its Daybreak suite. These tools aim to help organizations identify, validate, and patch vulnerabilities at scale.
OpenAI has introduced Codex Security and GPT-5.5-Cyber as part of its Daybreak suite. These tools aim to help organizations identify, validate, and patch vulnerabilities at scale.
NVIDIA has introduced Halos for Robotics, a full-stack functional safety system designed for physical AI. It enables AI-driven safety in unstructured environments where robots operate autonomously alongside humans in factories, warehouses, hospitals, and homes.
LLMs do not merely hallucinate; they amplify human epistemic overconfidence by turning weak hypotheses into coherent, polished claims before evidence is verified. This creates a risk of premature certainty in research, policy, and other domains, not because models lie, but because they accelerate human tendencies to favor elegant explanations over uncertainty.
The Qwen 3.6 27B model has been modified using Apostate to remove safety alignment, reducing its refusal rate from 92% to 7.6%. This change results in minimal impact on the model's capabilities, with a KL divergence of 0.120.
An AI Control Roadmap has been introduced to secure internal systems by integrating traditional safeguards with real-time monitoring capabilities.
GLM-5.2 is widely regarded as the first open-weight coding model that rivals frontier models like Opus 4.8 and GPT-5.5 in capability. Practitioners highlight its strong tool use, long-horizon planning, and autonomous subagent behavior, with consensus that it now credibly operates in the frontier SWE range. The model's emergence underscores growing value of open weights for provider competition, on-prem deployment, and reduced vendor lock-in.
LLM benchmarking is increasingly seen as marketing rather than objective measurement. Users question which benchmarks are genuinely meaningful for local models, rather than superficial score-based claims.
Users report that local language models are refusing to answer questions without guardrails, raising concerns about censorship in decentralized AI setups. The issue was shared on Reddit's LocalLLaMA community, where users describe instances of models blocking responses to legitimate queries.
NRT-Bench introduces a benchmark for multi-turn red-teaming of LLM agents operating in a simulated nuclear power plant. Across four frontier operator models, 8.7% to 12.1% of attack sessions result in loss of a critical safety function, with vulnerabilities largely disjoint across models. The effectiveness of defences varies significantly by model, showing strong model dependence.
Agentic AI systems face growing threats from model-guided automated attacks. A new defense strategy, Contextual Misdirection via Progressive Engagement (CMPE), reduces attacker success rates by up to two orders of magnitude and nearly eliminates verified attack success in benchmark tests.
Contagion Networks introduces a framework to measure how evaluator biases spread among LLM agents. In a 3-agent experiment, biases propagated consistently with contagion coefficients between 0.157 and 0.352, and homogeneous-model agents showed significantly weaker contagion than cross-model setups. Increasing evaluator committee size from k=1 to k=3 reduced effective contagion by 72.4%.
CWE-Trace evaluates eight vanilla and 15 LoRA-fine-tuned LLMs on Linux kernel vulnerability detection. Results show data contamination offers no advantage, and fine-tuning only shifts output thresholds without altering decision policies. Despite improved detection scores, LLMs lack reliable security reasoning, with top-1 CWE accuracy below 1.3% and binary detection performance at 52.1%.
FreeStyle proposes a framework that mines community LoRAs to generate large-scale style-content dual-reference image triplets. It employs a two-stage curriculum with disentanglement mechanisms to suppress style leakage and introduces a benchmark with style-invariant and VLM-based scores to evaluate content preservation and leakage rejection.
Studies show benign and harmful compliance demonstrations are not interchangeable in LLMs. Benign demonstrations can either reduce or increase harmful compliance depending on the model, with preference optimization playing a key role in preventing harmful compliance. Demonstration ordering shows strong recency bias, and models vary in how they handle refusal during in-context learning.
A new framework enables secure, probabilistic policy enforcement for AI agents in ambiguous environments. It uses distributionally robust optimization to compute rigorous upper bounds on policy violation probabilities without assuming predicate independence. The method outperforms prior approaches on terminal and tool calling agent benchmarks, improving the security-utility trade-off.
The Sovereign Execution Broker (SEB) introduces a runtime enforcement boundary that verifies and executes certified authority in agentic systems. It validates execution contracts, checks validity periods, and ensures policy compliance before invoking infrastructure APIs, providing a short-lived, auditable, and revocable execution capability. The prototype was evaluated on AWS and Kubernetes, measuring latency, revocation propagation, and fault injection resistance.
LedgerAgent introduces a structured ledger to maintain task states separately in tool-calling agents. It renders states into prompts and enforces policy constraints before tool execution, reducing policy violations and improving performance across customer-service domains.
A new defense framework enhances deep neural networks' resilience to false data injection attacks in power grids by adding a padding layer with pseudofeatures derived from input statistical distributions. This lightweight, model-agnostic approach increases input dimensionality in a randomized, data-aware way, making adversarial perturbations non-transferable and unpredictable, thus effectively countering attacks without performance degradation.
A new framework addresses data bias in machine learning by incorporating coverage constraints to ensure sufficient representation of intersectional subgroups. It trades small bias errors for greater data efficiency and formulates bias mitigation as an integer linear program, characterizing the price of fairness as a function of fairness tolerance to guide data governance and legal compliance.
This study introduces Riemannian sharpness, a reparametrization-invariant measure of flatness grounded in Fisher Information Matrix geometry. It proves SGD's stationary distribution concentrates at Riemannian-flat minima and links this geometric bias to generalization via a PAC-Bayes bound. Experiments on MNIST and CIFAR-10 show Riemannian sharpness better tracks generalization than Euclidean sharpness, with scaling consistent with theory.