korshunov.ai — ML news

Results

Sort

Lab Topic Source

Claude Code v2.1.181 Release Notes

Claude Code v2.1.181 introduces support for setting config settings via prompt syntax like /config thinking=false, adds sandbox Apple Events support on macOS, and improves streaming, auto-retry, and subagent behavior. It also fixes numerous bugs related to startup, file handling, clipboard, and UI responsiveness across platforms.

lab Claude Code Releases · 10d ago

Claude v2.1.178 Release Notes

Claude v2.1.178 introduces new permission rules using Tool(param:value) syntax, improved workflow and skill loading in nested directories, and enhanced auto mode and error messaging. It fixes critical issues including crashes, authentication errors, and UI behavior in Chrome and VSCode, while refining tool prompts and undo functionality.

arxiv arXiv cs.AI · 8d ago

Spotlight: Using Spot GPUs to Accelerate DiT RL Post-Training

Spotlight enables DiT RL post-training by leveraging idle spot GPUs, reducing costs by 1.4-6.4× while achieving superior image quality. It uses stale model weights in exploration and reconfigures sequence parallelism in real time, allowing efficient GPU utilization without breaking training pipelines.

arxiv arXiv cs.AI · 8d ago

RODS: Reward-Driven Online Data Synthesis for Multi-Turn Tool-Use Agents

RODS addresses sample depletion in multi-turn tool-use RL by using reward variance to detect capability boundaries. It synthesizes new data in real time, matching structural complexity of boundary samples, and maintains a dynamic replay buffer that co-evolves with the policy. RODS achieves performance comparable to a 17K-sample offline pipeline with 20x fewer trajectories.

arxiv arXiv cs.AI · 8d ago

AdsMind: Physics-Grounded Multi-Agent System for Adsorption Discovery

AdsMind is a closed-loop multi-agent system that uses machine learning force fields and feedback to correct errors in adsorption configuration searches on catalyst surfaces. It achieves 100% and 98.8% success rates on AA20 and OCD-GMAE62 benchmarks, reduces energy dispersion by 14-fold compared to baselines, and maintains correct adsorption-energy signs in DFT validation, outperforming open-loop LLM agents.

github llama.cpp · 9d ago

ggml-cpu: Conditionally enable POWER11 backend based on compiler support

The ggml-cpu project now conditionally enables the POWER11 backend in ggml based on compiler support for -mcpu=power11. This prevents build failures on current GCC/Clang toolchains while maintaining forward compatibility. Updates to CMakeLists.txt support this change, and -mcpu=power10 is used for both P10 and P11 architectures.

github llama.cpp · 9d ago

llama.cpp Release b9692 Adds New Binaries and Fixes

llama.cpp version b9692 introduces new binaries for macOS, Linux, Android, Windows, and openEuler across multiple architectures. The release includes updates to support Vulkan, ROCm, OpenVINO, SYCL, and HIP, with fixes to remove batch dim usage in llava_uhd.

github llama.cpp · 9d ago

Metal backend adds f16 and bf16 support for concat operator

The Metal backend in llama.cpp has been extended to support f16 and bf16 tensor types for the concat operator, in addition to existing f32 and i32 support. This update includes specialized kernel templates, updated pipeline getters, and improved type-based kernel dispatch, with assistance from pi:llama.cpp/Qwen3.6-27B.

github llama.cpp · 9d ago

llama.cpp Release b9687 Adds New Binaries and Fixes

llama.cpp version b9687 introduces new binaries for macOS, Linux, Android, Windows, and openEuler across multiple architectures. The release includes support for Vulkan, ROCm, OpenVINO, SYCL, and HIP, with updates to improve device validation and performance on available hardware.

github llama.cpp · 9d ago

llama.cpp releases version b9688 with new APIs and cross-platform binaries

llama.cpp releases version b9688, adding model management and SSE realtime updates APIs. The release includes prebuilt binaries for macOS, Linux, Android, Windows, and openEuler, supporting various architectures and acceleration frameworks like Vulkan, CUDA, OpenVINO, and SYCL.

github llama.cpp · 9d ago

LLaMA.cpp Release b9685 Adds SYCL Dev2Dev Memcpy and Multiple Platform Binaries

LLaMA.cpp version b9685 introduces SYCL-based dev2dev memcpy functionality, moving GGML_SYCL_DEV2DEV_MEMCPY to runtime table and improving peer-to-peer communication detection. The release includes precompiled binaries for macOS, Linux, Android, Windows, and openEuler across multiple architectures and APIs including Vulkan, ROCm, OpenVINO, and SYCL (FP32/FP16).

github llama.cpp · 9d ago

LLaMA.cpp Release b9684 Adds Conv_3D and Multiple Platform Binaries

LLaMA.cpp release b9684 introduces a new 3D convolution operation (conv_3d) and includes optimized implementations. The release provides prebuilt binaries for macOS, Linux, Android, Windows, and openEuler across various architectures and hardware acceleration options, including SYCL, Vulkan, CUDA, and OpenVINO.

github llama.cpp · 9d ago

llama.cpp release b9682 adds Vulkan support and new platform binaries

llama.cpp version b9682 introduces Vulkan support for Linux and Windows, enabling GPU acceleration. The release includes binaries for macOS, Linux, Android, Windows, and openEuler across multiple architectures, with CPU and GPU options including CUDA, OpenVINO, SYCL, and ROCm.

github llama.cpp · 9d ago

llama.cpp release b9675 adds FP16 support and new platform binaries

llama.cpp version b9675 enables FP16 support for operations like SQR, SQRT, LOG, SIN, COS, and CLAMP. The release includes binaries for macOS, Linux, Android, Windows, and openEuler across multiple architectures, with support for Vulkan, ROCm, OpenVINO, SYCL (FP16 and FP32), and CUDA 12.4 and 13.3.

github llama.cpp · 9d ago

llama.cpp release b9680: new binaries and Vulkan support

llama.cpp releases version b9680 with updated Vulkan support and new binaries for macOS, Linux, Android, Windows, and openEuler. The release includes CPU and GPU variants for multiple architectures, with support for Vulkan, CUDA, OpenVINO, SYCL, and ROCm.

arxiv arXiv cs.LG · 9d ago

LegalHalluLens: Auditing Hallucinations in Legal AI

LegalHalluLens introduces a framework to audit AI hallucinations in legal contexts by analyzing typed hallucination profiles across four claim categories. It reveals a 38-40 point gap between obligation/numeric and temporal claims, and shows two systems with identical 52% hallucination rates can have opposite risk directions. The framework uses a Risk Direction Index and calibrated debate pipelines to reduce fabricated detections by 45%, offering actionable diagnostics for trustworthy legal AI deployment.

arxiv arXiv cs.LG · 9d ago

NoiseTilt: Noise-Tilted Reverse Kernels for Diffusion Reward Alignment

NoiseTilt introduces NTRK, a reward-guided diffusion sampler that injects reward gradients via the noise term without altering the reverse kernel. By using a whitening operator, NTRK safely biases noise toward high reward, preserving sample quality while maintaining strong guidance. On aesthetic generation, NTRK achieves superior reward performance with 25 NFEs, reducing compute by 20× compared to state-of-the-art baselines.

arxiv arXiv cs.LG · 9d ago

Compositional Generalization in Language Model Reasoning

A hierarchical latent selection model shows that supervised fine-tuning and reinforcement learning work together to enable compositional generalization in language models. SFT provides raw module materials, while RL identifies and recombines atomic modules from compound traces to solve new problems. Training on compound traces leads to stronger generalization than isolated module training, and an effective protocol is found where SFT ensures module coverage and RL drives exploration of novel compositions.

arxiv arXiv cs.LG · 9d ago

Handlebars Triple-Brace Injection Exploits Structural Role Delimiters

Handlebars' triple-brace interpolation fails to protect against structural role injection, as HTML escaping only neutralizes angle-bracket delimiters. It leaves colon and Markdown hash delimiters intact, enabling attackers to hijack model behavior. The default escaping provides no protection for most role delimiter schemes and cannot replace a clear separation of instructions and data.

arxiv arXiv cs.CL · 9d ago

LegalHalluLens: Auditing Hallucinations in Legal AI

LegalHalluLens introduces a framework to audit AI hallucinations in legal contexts by analyzing typed hallucination profiles across four claim categories. It reveals a 38-40 point gap between obligation/numeric and temporal claims, and shows two systems with identical 52% hallucination rates can have opposite risk directions. The framework uses a Risk Direction Index and calibrated debate pipelines to reduce fabricated detections by 45% and improve accountability in legal AI deployment.