All articles — korshunov.ai

All articles Page 1 / 101

JetSpec: Speculative Decoding with Parallel Tree Drafting Enables up to 9.64x Lossless LLM Inference Speedup

JetSpec introduces a speculative decoding method called causal parallel tree drafting that co-optimizes drafting cost and quality to reduce LLM generation latency. The approach achieves up to 9.64x end-to-end speedup on MATH-500 and 4.58x on open-ended chat while maintaining lossless accuracy.

media r/LocalLLaMA · 5h ago

US Govt to individually approve who gets GPT 5.6.

A Reddit post by user /u/AtlanticHM on r/LocalLLaMA shares an image with the title "US Govt to individually approve who gets GPT 5.6.".

media r/LocalLLaMA · 5h ago

Resetting NVIDIA RTX 3090 Idle Power Consumption

A user reports that while driver version 595.71.05 previously allowed dual RTX 3090s to drop to 13-15W when idle, one card is now stuck at 24-30W with zero activity and fans off.

media r/LocalLLaMA · 5h ago

Prices of graphic cards are going crazy, should I buy a second card though?

A user on r/LocalLLaMA is considering adding a second GPU to their rig for local LLM inference but is deterred by the sharp increase in prices for AMD Radeon RX 7900 XTX and XT cards. The poster notes that while new RX 7900 XTX prices have risen to 1200€, used units are around 900€, and the budget-friendly RX 7900 XT starts at 700€.

media r/LocalLLaMA · 5h ago

Handling per-agent isolation and environment lifecycle in an orchestration library

The author details the architecture of a harness-agnostic orchestration library, focusing on managing agent environments through distinct workspace and runtime abstractions. The system defines four sequential states—unprovisioned, provisioned, started, and retired—to control the lifecycle of each agent instance.

media r/LocalLLaMA · 5h ago

Qwen 3.6 27b GLM 5.2 fine-tune?

A Reddit user questions the absence of a Qwen 3.6 27B model fine-tuned with GLM 5.2, noting that both models feature open weights and GLM is recognized for its reasoning capabilities. The poster speculates whether the lack of such a fine-tune is due to the recency of GLM 5.2 or a general lack of community interest in combining these specific models.

github llama.cpp · 5h ago

llama.cpp b9825 Release: Vulkan Fix and Cross-Platform Binaries

The llama.cpp project has released version b9825, which includes a fix for the Vulkan step operator when handling zero inputs. This update provides pre-built binaries for macOS, Linux, Windows, Android, and openEuler across various hardware backends.

github llama.cpp · 5h ago

llama.cpp b9826 release with SYCL norm fix

The llama.cpp project has published the b9826 release, which includes a fix for failed unit test cases related to the norm function in SYCL. This update provides pre-built binaries and frameworks across multiple platforms and hardware accelerators.

media Hugging Face Forums · 6h ago

The Checklist You Write Forces AI to Stop

This article argues that AI agents often execute actions based on incomplete instructions by guessing missing information, a problem termed "pre-execution confirmation failure." It proposes a runtime-enforced structure that requires verifying knowns and unknowns before any action is taken.

github CrewAI · 6h ago

crewAI 1.15.1 Release Notes

The crewAI version 1.15.1 update introduces new features for project initialization and deployment, alongside several bug fixes and documentation improvements.

github llama.cpp · 6h ago

llama.cpp b9822 release with macOS, Linux, Windows binaries

The llama.cpp project has published the b9822 release, providing pre-built binaries for macOS, iOS, Linux, Android, and Windows. This update includes a fix for the test-chat-template --no-common option and distributes builds across various hardware architectures and accelerators.

github llama.cpp · 7h ago

llama.cpp b9823 release adds Windows OpenVINO and updates binaries

The llama.cpp project has published version b9823, providing pre-built binaries for macOS, iOS, Linux, Android, Windows, and openEuler platforms. A key change in this release is the addition of a Windows OpenVINO build to the check-release pipeline.

github llama.cpp · 7h ago

llama.cpp b9824 release: binary renaming and new builds

The llama.cpp project has released version b9824, which includes improvements to the rpc-server and export-graph-ops binaries. The `export-graph-ops` tool is renamed to follow test naming conventions, while `rpc-server` is renamed to `ggml-rpc-server` to avoid conflicts in system directories.

media Hugging Face Forums · 13h ago

User Requests Deletion of Account Posting Porn, Gore, and Nazi Content

A user on the Hugging Face forums is requesting the deletion of the account 'cerealpotatochipssea' for uploading prohibited content. The report alleges that the account has shared 18+ material, gore, and Nazi-related imagery.

github CrewAI · 13h ago

CrewAI 1.15.1a1 Release Notes

The CrewAI 1.15.1a1 update introduces new telemetry tracking, enforces explicit project definitions for CrewAI, and improves the CLI deployment workflow.

github vLLM · 16h ago

v0.24.0

The v0.24.0 release includes a continuous integration update to raise the GSM8K startup timeout for MoE Refactor Qwen3 NVFP4 configurations.

lab OpenAI News · 18h ago

OpenAI previews GPT-5.6 Sol, Terra, and Luna models

OpenAI has initiated a limited preview of the GPT-5.6 series, introducing three new models: Sol as the flagship, Terra for balanced everyday work, and Luna for fast, affordable tasks. The company plans to make these models generally available in the coming weeks following this initial phase with trusted partners.

github llama.cpp · 18h ago

llama.cpp b9821 Release: CLI Flags and Multi-Platform Binaries

The llama.cpp project has released version b9821, which introduces command-line interface updates allowing users to invoke --version, --licenses, and --help flags. This release provides a comprehensive set of pre-built binaries for macOS, Linux, Android, Windows, and openEuler across various hardware accelerators.

media Hugging Face Forums · 19h ago

Mission: Build RAG System for Endangered Spoken Language

A job posting seeks an experienced NLP or LLM engineer to develop the first Retrieval-Augmented Generation (RAG) localization engine for a low-resource language spoken in South America. The project utilizes a proprietary corpus of pedagogical content and dictionary data developed over four years.

lab Claude Code Releases · 19h ago

Claude Code v2.1.195 Release Notes

Claude Code version 2.1.195 introduces several fixes and improvements, including new environment variables for mouse control in fullscreen mode and corrections to hook matcher logic.