korshunov.ai — ML news

Results

Sort

Lab Topic Source

autogpt-platform-beta-v0.6.64 Released

The autogpt-platform-beta-v0.6.64 release, dated 18th June 2026, introduces new features such as the AutoPilot Context Panel and Global Search, along with enhancements to graph saving, caching, and builder performance. It also includes security hardening, bug fixes for LLM provider issues, and UI improvements like a high-resolution touch icon.

lab Claude Code Releases · 9d ago

Claude Code v2.1.181 Release Notes

Claude Code v2.1.181 introduces support for setting config settings via prompt syntax like /config thinking=false, adds sandbox Apple Events support on macOS, and improves streaming, auto-retry, and subagent behavior. It also fixes numerous bugs related to startup, file handling, clipboard, and UI responsiveness across platforms.

lab Claude Code Releases · 10d ago

Claude v2.1.178 Release Notes

Claude v2.1.178 introduces new permission rules using Tool(param:value) syntax, improved workflow and skill loading in nested directories, and enhanced auto mode and error messaging. It fixes critical issues including crashes, authentication errors, and UI behavior in Chrome and VSCode, while refining tool prompts and undo functionality.

arxiv arXiv cs.CL · 8d ago

HydraHead: Head-Level Hybrid Attention for Long-Context Performance

HydraHead introduces a head-level hybridization of Full and Linear Attention, leveraging interpretability to select retrieval-critical heads and fuse outputs via a scale-normalized module. Trained on 15B tokens, it achieves over 69% improvement over baseline at 512K context length, outperforming layer-wise hybrids and approaching Qwen3.5's performance on long-context tasks.

arxiv arXiv cs.CL · 8d ago

Causal Activation Directions for Mitigating Emergent Misalignment in Language Models

Fine-tuning language models on insecure code causes emergent misalignment. A shared activation direction across four model families achieves 99.6% separation of aligned and misaligned activations, and subtracting it reduces code spillover by 21-51 points. Cross-architecture transfer shows behavioral suppression but lacks specificity, with within-model directions being causally actionable and cross-model directions only causally real.

media r/LocalLLaMA · 8d ago

GLM-5.2 Outperforms GPT-5.5 in AA-Briefcase Evaluation

Artificial Analysis' new agentic knowledge work evaluation, AA-Briefcase, shows GLM-5.2 surpassing GPT-5.5 in performance. The benchmark assesses real-world task execution and reasoning capabilities in knowledge work scenarios.

github LangGraph · 8d ago

langgraph releases version 1.2.6

LangGraph releases version 1.2.6, fixing a regression where nested subgraphs incorrectly inherit parent checkpoint_ns. The update also improves cancellation of running subgraphs during stream aborts and includes a CLI version update to 0.4.30.

github llama.cpp · 8d ago

llama.cpp Release b9703: Updates and Binary Downloads

llama.cpp version b9703 includes a rework of the server's preset handling, removing remote HF preset support and deprecated functions. The release provides binaries for macOS, Linux, Android, Windows, and openEuler across multiple architectures and hardware acceleration options, including Vulkan, CUDA, OpenVINO, and SYCL.

github llama.cpp · 8d ago

llama.cpp release b9704: fixes invalid grammar handling and adds new binaries

llama.cpp version b9704 now returns HTTP 400 for invalid grammar instead of silently dropping constraints. The release includes binaries for macOS, Linux, Android, Windows, and openEuler across multiple architectures and hardware accelerators, with support for Vulkan, ROCm, OpenVINO, SYCL, and CUDA.

media Don't Worry About the Vase · 8d ago

White House Pauses AI Deployment

The U.S. White House paused the deployment of frontier AI models, including Claude Fable 5 and Claude Mythos 5, citing a reported 'jailbreak' where the AI could identify and fix security vulnerabilities in code. Anthropic has been working with the Trump Administration to resolve the issue, but experts argue that the problem is fundamental—AI either can write secure code or it cannot, making a fix impossible without undermining its defensive capabilities.

github llama.cpp · 8d ago

LLaMA.cpp Release b9698 Adds Self-Update Support and Multiple Platform Binaries

LLaMA.cpp version b9698 enables self-updates only when built with llama-install.sh. The release includes binaries for macOS, Linux, Android, Windows, and openEuler across multiple architectures and hardware acceleration options, including Vulkan, CUDA, OpenVINO, and SYCL.

github CrewAI · 8d ago

CrewAI v1.14.8a Releases New FlowDefinition Features

CrewAI v1.14.8a introduces script and crew actions to FlowDefinition, adds DMN mode support, and enables flow execution without Python code. It also includes experimental support for JSON-first crews and ZIP deployment fallback, along with improved memory reset and token usage tracking.

github llama.cpp · 8d ago

llama.cpp Release b9693 Adds BF16 Support and Cross-Platform Binaries

llama.cpp version b9693 introduces BF16 support in its concat kernel and provides pre-built binaries for macOS, Linux, Android, Windows, and openEuler. The release includes CPU, Vulkan, ROCm, OpenVINO, SYCL, and HIP variants across multiple architectures, with a dedicated UI package available.

arxiv arXiv cs.LG · 8d ago

Discriminator-Guided RL Corrects Flow Matching with Data-Aligned Rewards

Discriminator-Guided RL (DRL) uses a pretrained representation space to train a discriminator that separates real data from model-generated samples. Its logit is used as a reward in KL-regularized RL, aligning model outputs with visual and semantic realism without human preferences. DRL improves FID and semantic FD across models like SiT and JiT, and enhances the Pareto frontier between preference and fidelity.

arxiv arXiv cs.LG · 8d ago

MAST Enables Selective Unlearning in RLVR-Induced Reasoning

MAST, a mechanism-guided unlearning method, achieves targeted forgetting of RLVR-induced reasoning with minimal collateral damage. On Qwen2.5-Math-1.5B and Qwen3-1.7B-Base, it significantly reduces MATH performance (45/150 to 37/15-0) while preserving GSM8K accuracy by +0.8 points and maintaining MATH retention at -0.5 points. Results hold across different seeds, objectives, and models, showing superior stability over full-parameter unlearning.

arxiv arXiv cs.LG · 8d ago

STARE: Surprisal-Guided Token-Level Advantage Reweighting for Policy Entropy Stability

STARE addresses policy entropy collapse in GRPO-based reinforcement learning by identifying entropy-critical token subsets via surprisal quantiles and reweighting their advantages. It maintains stable policy entropy across model scales and tasks, outperforming DAPO and other baselines by 4%-8% on AIME24 and AIME25, with consistent exploration-exploitation balance.

arxiv arXiv cs.LG · 8d ago

TxBench-PP: AI Agent Performance in Preclinical Pharmacology

TxBench-PP is a verifiable benchmark for small-molecule preclinical pharmacology, testing AI agents' ability to derive accurate conclusions from real-world assay data. Across 16 model-harness configurations, no system reliably made correct preclinical pharmacology decisions, with the best performance at 59.3% (Claude Opus 4.8 / Pi) and 55.3% (GPT-5.5 / Pi) of endpoint attempts.

arxiv arXiv cs.LG · 8d ago

OneCanvas: 3D Scene Understanding via Panoramic Reprojection

OneCanvas enables 3D scene understanding in Vision-Language Models by aggregating patch features onto a single panoramic canvas using 3D world coordinates. It achieves state-of-the-art performance on SQA3D and VSI-Bench, and generalizes to out-of-distribution data on SPBench, using significantly less training compute than existing methods.

arxiv arXiv cs.LG · 8d ago

Zero-Overhead Telemetry Detects Hidden ML Training

A study evaluates GPU workload classification using only zero-overhead NVML telemetry. The classifier achieves 98.2% accuracy in identifying training workloads and 43-87% accuracy against adversarially disguised, unexpected workloads across 9 GPU models.

arxiv arXiv cs.LG · 8d ago

Diffusion-Proof: First Framework for Diffusion LLMs in Formal Theorem Proving

Diffusion-Proof is the first framework to train and apply diffusion language models for formal theorem proving. It introduces dLLM-Prover-7B for whole-proof writing with long-range coherence and dLLM-Corrector-7- for local proof correction using bidirectional information. The framework outperforms auto-regressive LLM baselines by 1.61% on ProofNet-Test and 6.14% on MiniF2F-Test, and solves an IMO problem beyond the capability of DeepSeek-Prover-V2-7B.