All articles — korshunov.ai

All articles Page 1 / 100

llama.cpp b9827 release adds CUDA 2D async copy optimization

The llama.cpp b9827 release introduces a performance optimization for CUDA by adding a cudaMemcpy2DAsync fast path to the ggml_cuda_cpy function. This change accelerates same-type, same-shape strided copies where tensors are not fully contiguous but each row is contiguous, replacing slower element-wise scalar copy kernels.

media r/LocalLLaMA · 4h ago

BatonBot: Open Source Local Kanban Workflow for AI Coding Agents

The author introduces BatonBot, an open-source local-first application designed to streamline AI coding workflows by reducing the need for constant user supervision. The tool addresses the inefficiency of sequential agent interactions by allowing users to set up tasks and track progress visually on a Kanban-style board.

media r/LocalLLaMA · 4h ago

audio.cpp: 12 audio models in one C++ runtime with up to 5x speedup

The open-source project audio.cpp provides a native C++ inference framework for audio models built on top of ggml, currently supporting 12 released model families including TTS, ASR, and voice conversion. Benchmarks on Ubuntu/CUDA demonstrate that text-to-speech performance in this runtime is up to 5x faster than the corresponding Python reference implementations.

blog Simon Willison · 4h ago

Bruce Schneier on AI Liability and German Ruling

Bruce Schneier discusses a recent German ruling that holds Google liable for errors in its AI overviews, arguing that AI agents should be treated as agents of the deploying organization.

media r/LocalLLaMA · 4h ago

JetSpec: Speculative Decoding with Parallel Tree Drafting Enables up to 9.64x Lossless LLM Inference Speedup

JetSpec introduces a speculative decoding method called causal parallel tree drafting that co-optimizes drafting cost and quality to reduce LLM generation latency. The approach achieves up to 9.64x end-to-end speedup on MATH-500 and 4.58x on open-ended chat while maintaining lossless accuracy.

media r/LocalLLaMA · 4h ago

US Govt to individually approve who gets GPT 5.6.

A Reddit post by user /u/AtlanticHM on r/LocalLLaMA shares an image with the title "US Govt to individually approve who gets GPT 5.6.".

media r/LocalLLaMA · 4h ago

Resetting NVIDIA RTX 3090 Idle Power Consumption

A user reports that while driver version 595.71.05 previously allowed dual RTX 3090s to drop to 13-15W when idle, one card is now stuck at 24-30W with zero activity and fans off.

media r/LocalLLaMA · 4h ago

Prices of graphic cards are going crazy, should I buy a second card though?

A user on r/LocalLLaMA is considering adding a second GPU to their rig for local LLM inference but is deterred by the sharp increase in prices for AMD Radeon RX 7900 XTX and XT cards. The poster notes that while new RX 7900 XTX prices have risen to 1200€, used units are around 900€, and the budget-friendly RX 7900 XT starts at 700€.

media r/LocalLLaMA · 4h ago

Handling per-agent isolation and environment lifecycle in an orchestration library

The author details the architecture of a harness-agnostic orchestration library, focusing on managing agent environments through distinct workspace and runtime abstractions. The system defines four sequential states—unprovisioned, provisioned, started, and retired—to control the lifecycle of each agent instance.

media r/LocalLLaMA · 5h ago

Qwen 3.6 27b GLM 5.2 fine-tune?

A Reddit user questions the absence of a Qwen 3.6 27B model fine-tuned with GLM 5.2, noting that both models feature open weights and GLM is recognized for its reasoning capabilities. The poster speculates whether the lack of such a fine-tune is due to the recency of GLM 5.2 or a general lack of community interest in combining these specific models.

github llama.cpp · 5h ago

llama.cpp b9825 Release: Vulkan Fix and Cross-Platform Binaries

The llama.cpp project has released version b9825, which includes a fix for the Vulkan step operator when handling zero inputs. This update provides pre-built binaries for macOS, Linux, Windows, Android, and openEuler across various hardware backends.

github llama.cpp · 5h ago

llama.cpp b9826 release with SYCL norm fix

The llama.cpp project has published the b9826 release, which includes a fix for failed unit test cases related to the norm function in SYCL. This update provides pre-built binaries and frameworks across multiple platforms and hardware accelerators.

media Hugging Face Forums · 5h ago

The Checklist You Write Forces AI to Stop

This article argues that AI agents often execute actions based on incomplete instructions by guessing missing information, a problem termed "pre-execution confirmation failure." It proposes a runtime-enforced structure that requires verifying knowns and unknowns before any action is taken.

github CrewAI · 5h ago

crewAI 1.15.1 Release Notes

The crewAI version 1.15.1 update introduces new features for project initialization and deployment, alongside several bug fixes and documentation improvements.

github llama.cpp · 5h ago

llama.cpp b9822 release with macOS, Linux, Windows binaries

The llama.cpp project has published the b9822 release, providing pre-built binaries for macOS, iOS, Linux, Android, and Windows. This update includes a fix for the test-chat-template --no-common option and distributes builds across various hardware architectures and accelerators.

github llama.cpp · 6h ago

llama.cpp b9823 release adds Windows OpenVINO and updates binaries

The llama.cpp project has published version b9823, providing pre-built binaries for macOS, iOS, Linux, Android, Windows, and openEuler platforms. A key change in this release is the addition of a Windows OpenVINO build to the check-release pipeline.

github llama.cpp · 6h ago

llama.cpp b9824 release: binary renaming and new builds

The llama.cpp project has released version b9824, which includes improvements to the rpc-server and export-graph-ops binaries. The `export-graph-ops` tool is renamed to follow test naming conventions, while `rpc-server` is renamed to `ggml-rpc-server` to avoid conflicts in system directories.

media Hugging Face Forums · 12h ago

llama.cpp b9827 release adds CUDA 2D async copy optimization

BatonBot: Open Source Local Kanban Workflow for AI Coding Agents

audio.cpp: 12 audio models in one C++ runtime with up to 5x speedup

Bruce Schneier on AI Liability and German Ruling

JetSpec: Speculative Decoding with Parallel Tree Drafting Enables up to 9.64x Lossless LLM Inference Speedup

US Govt to individually approve who gets GPT 5.6.

Resetting NVIDIA RTX 3090 Idle Power Consumption

Prices of graphic cards are going crazy, should I buy a second card though?

Handling per-agent isolation and environment lifecycle in an orchestration library

Qwen 3.6 27b GLM 5.2 fine-tune?

llama.cpp b9825 Release: Vulkan Fix and Cross-Platform Binaries

llama.cpp b9826 release with SYCL norm fix

The Checklist You Write Forces AI to Stop

crewAI 1.15.1 Release Notes

llama.cpp b9822 release with macOS, Linux, Windows binaries

llama.cpp b9823 release adds Windows OpenVINO and updates binaries

llama.cpp b9824 release: binary renaming and new builds

User Requests Deletion of Account Posting Porn, Gore, and Nazi Content

CrewAI 1.15.1a1 Release Notes

v0.24.0