korshunov.ai — ML news

Results

Sort

Lab Topic Source

v2.1.183 Release Notes

v2.1.183 improves auto mode safety by blocking destructive git and destroy commands without explicit user consent. It adds deprecation warnings for models, introduces attribution.sessionUrl to hide session links, and fixes multiple issues including terminal behavior, subagent performance, and input handling in web and tmux environments.

github AutoGPT · 7d ago

autogpt-platform-beta-v0.6.64 Released

The autogpt-platform-beta-v0.6.64 release, dated 18th June 2026, introduces new features such as the AutoPilot Context Panel and Global Search, along with enhancements to graph saving, caching, and builder performance. It also includes security hardening, bug fixes for LLM provider issues, and UI improvements like a high-resolution touch icon.

lab Claude Code Releases · 8d ago

Claude Code v2.1.181 Release Notes

Claude Code v2.1.181 introduces support for setting config settings via prompt syntax like /config thinking=false, adds sandbox Apple Events support on macOS, and improves streaming, auto-retry, and subagent behavior. It also fixes numerous bugs related to startup, file handling, clipboard, and UI responsiveness across platforms.

lab Claude Code Releases · 9d ago

Claude v2.1.178 Release Notes

Claude v2.1.178 introduces new permission rules using Tool(param:value) syntax, improved workflow and skill loading in nested directories, and enhanced auto mode and error messaging. It fixes critical issues including crashes, authentication errors, and UI behavior in Chrome and VSCode, while refining tool prompts and undo functionality.

github llama.cpp · 6d ago

llama.cpp release b9718: consolidated slot selection and new binary builds

llama.cpp version b9718 consolidates slot selection into a single function, get_available_slot, while maintaining LCP similarity checks for prompt cache updates. The release includes binary builds for macOS, Linux, Android, Windows, and openEuler across multiple architectures and hardware acceleration options.

github llama.cpp · 6d ago

llama.cpp Release b9721 Available for Multiple Platforms

llama.cpp has released version b9721, offering binaries for macOS, Linux, Android, Windows, and openEuler across various architectures. The release includes CPU, Vulkan, ROCm, OpenVINO, SYCL, and HIP support, with a dedicated UI package. A feature for Apple Silicon with KleidiAI is currently disabled.

github OpenAI Agents SDK · 6d ago

v0.17.6 Release Notes

The v0.17.6 release adds pre-approval tool input guardrails and SDK-only custom data for tool outputs. It also enforces a strict JSON-compatible contract for tool outputs and suppresses unnecessary whitespace warnings in tool names. @siddiksawani made their first contribution in this release.

github llama.cpp · 7d ago

LLaMA.cpp Release b9715 Adds CUDA Col2Im 1D and Multiple Platform Binaries

LLaMA.cpp version b9715 introduces CUDA support for GGML_OP_COL2IM_1D, building on a CPU implementation. The release includes binaries for macOS, Linux, Android, Windows, and openEuler across multiple architectures and acceleration frameworks, including Vulkan, ROCm, OpenVINO, and SYCL.

arxiv arXiv cs.AI · 7d ago

NRT-Bench: Multi-turn Red-teaming of LLM Agents in Safety-Critical Systems

NRT-Bench introduces a benchmark for multi-turn red-teaming of LLM agents operating in a simulated nuclear power plant. Across four frontier operator models, 8.7% to 12.1% of attack sessions result in loss of a critical safety function, with vulnerabilities largely disjoint across models. The effectiveness of defences varies significantly by model, showing strong model dependence.

arxiv arXiv cs.AI · 7d ago

Defensive Misdirection Against Automated Attacks on Agentic AI

Agentic AI systems face growing threats from model-guided automated attacks. A new defense strategy, Contextual Misdirection via Progressive Engagement (CMPE), reduces attacker success rates by up to two orders of magnitude and nearly eliminates verified attack success in benchmark tests.

arxiv arXiv cs.AI · 7d ago

UltraQuant: 4-bit KV Caching for Context-Heavy Agents

UltraQuant enables 4-bit KV caching for context-heavy agents, reducing P50 time-to-first-token by 3.47x in late rounds and boosting output throughput by 1.63x over FP8 KV baseline. It achieves this using FP8 queries, FP4 KV tensors, UE8M0 group scales, and native scaled-MFMA on AMD CDNA4 GPUs, with optimizations for decode-attention kernels and robust design choices like asymmetric K/V treatment and Walsh-Hadamard rotation.

arxiv arXiv cs.AI · 7d ago

Evaluator Bias Propagation in Multi-Agent LLM Systems

Contagion Networks introduces a framework to measure how evaluator biases spread among LLM agents. In a 3-agent experiment, biases propagated consistently with contagion coefficients between 0.157 and 0.352, and homogeneous-model agents showed significantly weaker contagion than cross-model setups. Increasing evaluator committee size from k=1 to k=3 reduced effective contagion by 72.4%.

arxiv arXiv cs.AI · 7d ago

Calibration Without Comprehension in LLM Vulnerability Detection

CWE-Trace evaluates eight vanilla and 15 LoRA-fine-tuned LLMs on Linux kernel vulnerability detection. Results show data contamination offers no advantage, and fine-tuning only shifts output thresholds without altering decision policies. Despite improved detection scores, LLMs lack reliable security reasoning, with top-1 CWE accuracy below 1.3% and binary detection performance at 52.1%.

arxiv arXiv cs.AI · 7d ago

Efficient and Sound Probabilistic Verification for AI Agents

A new framework enables secure, probabilistic policy enforcement for AI agents in ambiguous environments. It uses distributionally robust optimization to compute rigorous upper bounds on policy violation probabilities without assuming predicate independence. The method outperforms prior approaches on terminal and tool calling agent benchmarks, improving the security-utility trade-off.

arxiv arXiv cs.AI · 7d ago

LedgerAgent: Structured State for Policy-Adherent Tool-Calling Agents

LedgerAgent introduces a structured ledger to maintain task states separately in tool-calling agents. It renders states into prompts and enforces policy constraints before tool execution, reducing policy violations and improving performance across customer-service domains.

github CrewAI · 7d ago

v1.14.8a2 Release Notes

v1.14.8a2 adds a single agent action to Flow definitions and validates CEL expressions at load time. It includes a new Datadog integration guide with an importable operations dashboard and updated snapshot and changelog for v1.14.8a1.

arxiv arXiv cs.LG · 7d ago

UltraQuant: 4-bit KV Caching for Context-Heavy Agents

UltraQuant introduces a 4-bit KV caching method tailored for context-heavy agent workloads. It achieves 3.47x reduction in P50 time-to-first-token in late rounds and 1.63x higher output throughput compared to FP8 KV caching, using FP8 queries, FP4 KV tensors, and native AMD CDNA4 scaled-MFMA support.

arxiv arXiv cs.LG · 7d ago

Marginal Advantage Accumulation for Memory-Driven Agent Self-Evolution

This paper introduces Marginal Advantage Accumulation (MAA), a post-processing architecture that addresses cross-batch inconsistency in memory-driven agent self-evolution. MAA formalizes alignment and comparability as structural conditions, uses differential signals and exponential moving average to accumulate signed evidence per operation, and ensures traceability via semantic identity merging. It outperforms batch-level baselines in 14 out of 16 settings and reduces token consumption by about 75%.

arxiv arXiv cs.LG · 7d ago

LLM Alignment Using Implicit User Feedback

A new dataset, IFLLM, collects mouse trajectories and eye gazing data from users interacting with LLMs. It shows that implicit feedback significantly improves LLM alignment, boosting text-based reward model accuracy from 55% to 64% and nearly tripling response quality improvements after DPO training on eight LLMs.

arxiv arXiv cs.LG · 7d ago

Evaluator Bias Propagation in Multi-Agent LLM Systems

Contagion Networks introduces a framework to measure how evaluator biases spread among LLM agents. In a 3-agent experiment, biases propagate with coefficients between 0.157 and 0.352, and homogeneous-model agents show significantly weaker contagion than cross-model setups. Increasing evaluator committee size from k=1 to k=3 reduces effective contagion by 72.4%.