All articles — korshunov.ai

All articles Page 1 / 130

Open Interpreter 0.0.17 Released

Open Interpreter has released version 0.0.17. The update introduces new features and improvements to its core functionality, enhancing user interaction and task execution capabilities.

media r/LocalLLaMA · 12d ago

Local Agent Web Access via SearXNG and Scrapling

A local agent can access the web without paid APIs by using self-hosted SearXNG for search and Scrapling with Trafilatura for page extraction. The setup avoids vendor dependencies, uses open-source tools, and delivers search results and page content in Markdown format, with fallbacks for CAPTCHAs and security challenges.

media r/LocalLLaMA · 12d ago

Local agent on 4090 - looking for LM Studio settings

A user reports slow token generation when running a local agent on a 4090 with 24GB VRAM, despite adjusting context and batching settings. They note Gemma4 performs faster but produces incorrect tokens like <code></tool_call></code>, and seek recommended settings and explanations for parameters such as top_p and top_k.

media r/LocalLLaMA · 12d ago

SupraLabs Releases supra-title-FFT-preview with 115K Samples

SupraLabs has launched supra-title-FFT-preview, a chat title generation model trained on 115K samples from a filtered dataset, expanding coverage beyond its previous 12K-sample model. The model uses full fine-tuning on LiquidAI/LFM2.5-350M-Base with BF16 precision and is designed for single-purpose chat title generation, available via Hugging Face and supporting direct loading or vLLM deployment.

media r/LocalLLaMA · 12d ago

RTX 5090 MSI Power Usage and Cable Warning

The RTX 5090 MSI consumes 475-500W during inference or diffusion training. The user reports no issues with the power cable, emphasizing that it should not be bent to ensure safe and stable operation.

media r/LocalLLaMA · 12d ago

Attention Algebra — a grammar that translates natural language into spectrograms

Attention Algebra is a prototype that translates natural language into algebraic expressions, maps them to mathematical dynamics, and visualizes the result as a spectrogram. It treats language as a lossy projection of high-dimensional states, proposing that raw attention patterns grouped into functions serve as the 'DNA' of text, enabling efficient reasoning chains by reducing token usage from 20k to 4k.

github llama.cpp · 12d ago

LLaMA.cpp Release b9732: New Binaries and Updates

LLaMA.cpp releases version b9732 with updated binaries for macOS, Linux, Android, Windows, and openEuler. The release includes refactored child-to-router communication, fixes to wakeup handling, improved update_status(), and documentation. New builds support Vulkan, ROCm, OpenVINO, SYCL, and CUDA 12/13 on multiple architectures.

media r/LocalLLaMA · 12d ago

I benchmarked Claude's 'Fast C++'. It wasn't faster

A user tested Claude's claimed 'Fast C++' implementation and found it did not outperform standard C++ in benchmarks. The post includes a link to a Substack article detailing the testing process and results.

github llama.cpp · 12d ago

ggml-webgpu Adds F16 Adapter Toggles for Vulkan and NVIDIA

The ggml-webgpu project has added adapter toggles for half-precision (F16) support on Vulkan and NVIDIA GPUs. This update enables improved performance on compatible hardware across multiple platforms, including macOS, Linux, Android, Windows, and openEuler, with specific builds available for ARM and x64 architectures.

media r/LocalLLaMA · 12d ago

$1800 GPU cost runs Qwen3.6-27B with 262K context and 55 tok/s

A setup using four 5060 Ti GPUs (totaling $1800) achieves 55 tokens per second with Qwen3.6-27B-FP8, supporting 262K context length and bfloat16 KV cache. The configuration uses P2P and FlashInfer, with benchmark results showing 55.67 output token throughput and 65.25% speculative decoding acceptance rate.

blog Simon Willison · 12d ago

Sean Lynch on MCP's Auth Flow Isolation

Sean Lynch highlights that the Model Context Protocol (MCP) offers a key advantage by isolating authentication flows outside the agent's context window. He suggests the ideal form of MCP could be a simple auth gateway for APIs, which would still represent a significant improvement.

github llama.cpp · 12d ago

llama.cpp Release b9731: Performance Optimization and Cross-Platform Binaries

llama.cpp version b9731 introduces optimization using std::partial_sort to reduce token sorting overhead, improving performance from 8.555ms to 0.704ms for top-n token selection. The release includes prebuilt binaries for macOS, Linux, Android, Windows, and openEuler across multiple architectures and hardware acceleration options.

github llama.cpp · 12d ago

llama.cpp release b9730: fixes and new binaries

llama.cpp version b9730 includes fixes for UTF-8 handling on Windows and improvements to ggml_fopen and CLI. The release provides binaries for macOS, Linux, Android, Windows, and openEuler across multiple architectures and hardware acceleration options, including Vulkan, CUDA, OpenVINO, and SYCL.

media r/LocalLLaMA · 12d ago

Best Local Agents - Jun 2026

A discussion thread identifies the best local AI agents available today, emphasizing open-weight models and local hardware execution. The post defines 'agents' as autonomous software that self-determines actions without pre-programming, distinguishing them from tools like IFTTT or Apple Shortcuts, and sets rules requiring local deployment and open-source agent software as a primary focus.

github Open Interpreter · 12d ago

Rust Release 0.0.12

Rust version 0.0.12 has been released. This early version is part of Rust's initial development phase and includes foundational features for the language.

github Open Interpreter · 12d ago

Rust Release 0.0.13

Rust version 0.0.13 has been released. This early version is part of Rust's initial development phase and includes foundational features for the language.

github Open Interpreter · 12d ago

Rust Release 0.0.14

Rust version 0.0.14 has been released. This early version is part of Rust's initial development phase and includes foundational features for the language.

media r/LocalLLaMA · 12d ago

Help Running Local Hermes Agent with llama-cpp

A user reports issues running a local Hermes AI agent on a high-end rig using self-compiled llama-cpp. The setup experiences frequent KV cache reprocessing every 5 messages and slow reasoning, with the agent repeatedly pausing to report progress instead of continuing autonomously. The user seeks guidance on whether their llama-cpp parameters are incorrect or what adjustments can improve agent performance and sustained reasoning without interruptions.

media r/LocalLLaMA · 12d ago

Maximizing Performance of 2x3090 with NVLink

A user reports achieving only 60 tokens per second in short bursts and average 40-45 TPS when running Qwen 3.6 27B with Q8_0 quantization on two GeForce 3090 GPUs connected via NVLink. The setup includes Ubuntu 24.04, Ryzen 7950x3D, and 64GB DDR5, with display routed through an eGPU.

github llama.cpp · 12d ago

LLaMA.cpp Release b9729: New Binaries and Platform Support

LLaMA.cpp releases version b9729 with binaries for macOS, Linux, Android, Windows, and openEuler across multiple architectures. The release includes CPU, Vulkan, OpenVINO, SYCL, and ROCm support, along with a new UI package. Internal references to 'webui' have been removed.