All articles — korshunov.ai

All articles Page 1 / 100

audio.cpp: 12 audio models in one C++ runtime with up to 5x speedup

The open-source project audio.cpp provides a native C++ inference framework for audio models built on top of ggml, currently supporting 12 released model families including TTS, ASR, and voice conversion. Benchmarks on Ubuntu/CUDA demonstrate that text-to-speech performance in this runtime is up to 5x faster than the corresponding Python reference implementations.

blog Simon Willison · 4h ago

Bruce Schneier on AI Liability and German Ruling

Bruce Schneier discusses a recent German ruling that holds Google liable for errors in its AI overviews, arguing that AI agents should be treated as agents of the deploying organization.

media r/LocalLLaMA · 4h ago

JetSpec: Speculative Decoding with Parallel Tree Drafting Enables up to 9.64x Lossless LLM Inference Speedup

JetSpec introduces a speculative decoding method called causal parallel tree drafting that co-optimizes drafting cost and quality to reduce LLM generation latency. The approach achieves up to 9.64x end-to-end speedup on MATH-500 and 4.58x on open-ended chat while maintaining lossless accuracy.

media r/LocalLLaMA · 4h ago

US Govt to individually approve who gets GPT 5.6.

A Reddit post by user /u/AtlanticHM on r/LocalLLaMA shares an image with the title "US Govt to individually approve who gets GPT 5.6.".

media r/LocalLLaMA · 4h ago

Resetting NVIDIA RTX 3090 Idle Power Consumption

A user reports that while driver version 595.71.05 previously allowed dual RTX 3090s to drop to 13-15W when idle, one card is now stuck at 24-30W with zero activity and fans off.

media r/LocalLLaMA · 4h ago

Prices of graphic cards are going crazy, should I buy a second card though?

A user on r/LocalLLaMA is considering adding a second GPU to their rig for local LLM inference but is deterred by the sharp increase in prices for AMD Radeon RX 7900 XTX and XT cards. The poster notes that while new RX 7900 XTX prices have risen to 1200€, used units are around 900€, and the budget-friendly RX 7900 XT starts at 700€.

media r/LocalLLaMA · 4h ago

Handling per-agent isolation and environment lifecycle in an orchestration library

The author details the architecture of a harness-agnostic orchestration library, focusing on managing agent environments through distinct workspace and runtime abstractions. The system defines four sequential states—unprovisioned, provisioned, started, and retired—to control the lifecycle of each agent instance.

media r/LocalLLaMA · 5h ago

Qwen 3.6 27b GLM 5.2 fine-tune?

A Reddit user questions the absence of a Qwen 3.6 27B model fine-tuned with GLM 5.2, noting that both models feature open weights and GLM is recognized for its reasoning capabilities. The poster speculates whether the lack of such a fine-tune is due to the recency of GLM 5.2 or a general lack of community interest in combining these specific models.

github llama.cpp · 5h ago

llama.cpp b9825 Release: Vulkan Fix and Cross-Platform Binaries

The llama.cpp project has released version b9825, which includes a fix for the Vulkan step operator when handling zero inputs. This update provides pre-built binaries for macOS, Linux, Windows, Android, and openEuler across various hardware backends.

github llama.cpp · 5h ago

llama.cpp b9826 release with SYCL norm fix

The llama.cpp project has published the b9826 release, which includes a fix for failed unit test cases related to the norm function in SYCL. This update provides pre-built binaries and frameworks across multiple platforms and hardware accelerators.

media Hugging Face Forums · 5h ago

The Checklist You Write Forces AI to Stop

This article argues that AI agents often execute actions based on incomplete instructions by guessing missing information, a problem termed "pre-execution confirmation failure." It proposes a runtime-enforced structure that requires verifying knowns and unknowns before any action is taken.

github CrewAI · 5h ago

crewAI 1.15.1 Release Notes

The crewAI version 1.15.1 update introduces new features for project initialization and deployment, alongside several bug fixes and documentation improvements.

github llama.cpp · 5h ago

llama.cpp b9822 release with macOS, Linux, Windows binaries

The llama.cpp project has published the b9822 release, providing pre-built binaries for macOS, iOS, Linux, Android, and Windows. This update includes a fix for the test-chat-template --no-common option and distributes builds across various hardware architectures and accelerators.

github llama.cpp · 6h ago

llama.cpp b9823 release adds Windows OpenVINO and updates binaries

The llama.cpp project has published version b9823, providing pre-built binaries for macOS, iOS, Linux, Android, Windows, and openEuler platforms. A key change in this release is the addition of a Windows OpenVINO build to the check-release pipeline.

github llama.cpp · 6h ago

llama.cpp b9824 release: binary renaming and new builds

The llama.cpp project has released version b9824, which includes improvements to the rpc-server and export-graph-ops binaries. The `export-graph-ops` tool is renamed to follow test naming conventions, while `rpc-server` is renamed to `ggml-rpc-server` to avoid conflicts in system directories.

media Hugging Face Forums · 12h ago

User Requests Deletion of Account Posting Porn, Gore, and Nazi Content

A user on the Hugging Face forums is requesting the deletion of the account 'cerealpotatochipssea' for uploading prohibited content. The report alleges that the account has shared 18+ material, gore, and Nazi-related imagery.

github CrewAI · 12h ago

CrewAI 1.15.1a1 Release Notes

The CrewAI 1.15.1a1 update introduces new telemetry tracking, enforces explicit project definitions for CrewAI, and improves the CLI deployment workflow.

github vLLM · 15h ago

v0.24.0

The v0.24.0 release includes a continuous integration update to raise the GSM8K startup timeout for MoE Refactor Qwen3 NVFP4 configurations.

lab OpenAI News · 17h ago

OpenAI previews GPT-5.6 Sol, Terra, and Luna models

OpenAI has initiated a limited preview of the GPT-5.6 series, introducing three new models: Sol as the flagship, Terra for balanced everyday work, and Luna for fast, affordable tasks. The company plans to make these models generally available in the coming weeks following this initial phase with trusted partners.

github llama.cpp · 17h ago

llama.cpp b9821 Release: CLI Flags and Multi-Platform Binaries

The llama.cpp project has released version b9821, which introduces command-line interface updates allowing users to invoke --version, --licenses, and --help flags. This release provides a comprehensive set of pre-built binaries for macOS, Linux, Android, Windows, and openEuler across various hardware accelerators.