All articles — korshunov.ai

All articles Page 1 / 130

Add libandroid-spawn dependency for Android build

The Android build documentation has been updated to include a dependency on libandroid-spawn. This addition is intended to support the building process within the specified environment.

media r/LocalLLaMA · 9d ago

Gemma 4 31B Q6 vs Gemma 4 31B QAT Comparison

A Reddit discussion compares Gemma 4 31B Q6 and Gemma 4 31B QAT models, focusing on performance for creative writing tasks. Users seek guidance on which variant offers better overall results, with questions about KLD (Kullback-Leibler Divergence) as a metric for model quality.

media r/LocalLLaMA · 9d ago

Local text-to-image model comparison: The ultimate test

A test evaluated 192 prompts across local text-to-image models on a GX10 Spark, assessing capabilities like text understanding, face generation, and spatial composition. Results are available on ImageBench, with comparisons to frontier APIs using vision language models, and all prompts and images are publicly accessible.

media r/LocalLLaMA · 9d ago

Workflow for programmers with slow local LLM setup

Users share their workflows for coding with local LLMs when token generation is below 10 tokens per second. Common strategies include using concise prompts, leveraging local models with minimal context, and batching queries to maximize efficiency.

media r/LocalLLaMA · 9d ago

Your Favorite Workflow to Convert PDF with Complex Structure to Markdown?

A user asks about tools for converting PDFs with complex structures like tables and floating boxes into Markdown. They have tried markitdown, Docling, and Mineru, and seek recommendations for better alternatives.

media r/LocalLLaMA · 9d ago

Agent recommendations for Python web project setup

A user seeks software stack recommendations for building a Python web project in PyCharm using local LLMs. They aim to leverage agent systems that can generate plans, execute code, and perform testing, with current experience using GPT-OSS and Qwen models showing performance and quality differences.

media r/LocalLLaMA · 9d ago

Finally seeing benefits of MTP after removing GGML_CUDA_ALLREDUCE

A user reported that removing the GGML_CUDA_ALLREDUCE environment variable led to a noticeable improvement in throughput (TPS) for MTP in local LLM inference. The change, which was previously considered beneficial, unexpectedly reduced overhead and improved performance, especially after extensive configuration trials.

media r/LocalLLaMA · 9d ago

Hermes Agent Looks Ugly and Has Poor UX

A user expresses disappointment with Hermes Agent's web UI, citing ugly fonts, graphics, and a sluggish UX both in web and terminal interfaces. Despite its promise of built-in features and ease of use, the user finds it significantly slower and less intuitive than Pi Mono Agent, especially when used with Qwen3.6-35B and Gemma4-26B models.

media r/LocalLLaMA · 10d ago

Leaderboard for quantized models, similar to artificial analysis?

Artificial analysis' model leaderboard helps compare model intelligence but ignores quantization effects for open models. Users ask if there's a better way to compare quantized open models with proprietary ones without running them directly.

media r/LocalLLaMA · 10d ago

Not a new model, just a Happy Father's Day and a thank you

A Reddit user expresses gratitude to the LocalLLaMA community, sharing that the post is not about a new model but a personal thank you. As a dad, they highlight the community's value as a refuge during family life, appreciating interactions on setup, hardware, and model tuning.

media r/LocalLLaMA · 10d ago

Local LLM Inference Optimization: The Complete Guide

A comprehensive guide to optimizing local LLM inference covers VRAM management, KV cache, MoE placement, MTP, CPU tuning, and common out-of-memory issues. The guide is available at https://carteakey.dev/blog/local-inference/local-llm-optimization/ and includes feedback requests from the author.

media r/LocalLLaMA · 10d ago

GLM-5.2 Released on DeepSWE Benchmark

GLM-5.2 has been evaluated on the DeepSWE benchmark, with performance highlighted in the top-right corner of the visualization. The post notes that scores decrease as price increases, and points to the DeepSWE website and ArtificialAnalysis for alternate evaluations, while addressing criticisms and historical context around benchmark validity.

lab OpenAI News · 10d ago

Samsung Deploys ChatGPT and Codex for Employees

Samsung Electronics has rolled out OpenAI's ChatGPT Enterprise and Codex to its global workforce. This deployment represents one of OpenAI's largest enterprise AI initiatives to date.

blog Simon Willison · 10d ago

Cloudflare Launches Temporary Accounts for AI Agents

Cloudflare now allows users to deploy Workers applications without a permanent account using the command npx wrangler deploy --temporary. Each deployment runs in an ephemeral project that stays live for 60 minutes, with a claim link expiring in under an hour if ownership is not claimed.

blog Simon Willison · 10d ago

sqlite-utils 4.0rc1 Release

sqlite-utils 4.0rc1 introduces migration support and nested transactions. The release is documented on Simon Willison's blog.

blog Simon Willison · 10d ago

sqlite-utils 4.0rc1 Adds Migrations and Nested Transactions

sqlite-utils 4.0rc1 introduces database migrations and db.atomic() for nested transactions. Migrations support script-based schema changes using a simplified API, while db.atomic() enables nested transactions via savepoints, improving error handling and data integrity. The release includes backwards-incompatible changes, such as updated upsert behavior and dropped Python 3.8 support, with options to maintain older behaviors.

media r/LocalLLaMA · 10d ago

Qwen 27B for planning, Qwen 35B-A3B for execution

A user explores using Qwen 27B for long-horizon task planning and Qwen 35B-A3B for rapid execution, noting the 27B runs at 7-10 tokens per second and the 35B-A3B at ~18 tokens per second. The user considers switching between models to leverage their different strengths, though currently uses the 35B-A3B exclusively and questions whether the intelligence gap between models is significant.

github llama.cpp · 10d ago

llama.cpp release b9750: new call statement and cross-platform binaries

llama.cpp version b9750 introduces a call statement implementation and rolls back an unintended change. The release includes precompiled binaries for macOS, Linux, Android, Windows, and openEuler across multiple architectures and hardware acceleration options, including Vulkan, CUDA, OpenVINO, and SYCL.

media r/LocalLLaMA · 10d ago

Updated Vision Model Benchmark Results and Recommendations

A revised benchmark of local vision language models evaluates 23 models across 30 images with 3 tests each, totaling 2,070 tests and 60 to 70 inference hours. The top-performing model is Qwen3.6 27B (nothink) at Q4 with a 79.6 score, followed by Qwen3.5 4B (nothink) at Q4, and Qwen3-VL 8B at Q8. Key findings include thinking mode degrading vision performance, MoE models underperforming compared to dense models, and Q8 quantization not universally improving results.

media r/LocalLLaMA · 10d ago

Qwen 3.6 27B Apostate Released with Safety Removed

The Qwen 3.6 27B model has been modified using Apostate to remove safety alignment, reducing its refusal rate from 92% to 7.6%. This change results in minimal impact on the model's capabilities, with a KL divergence of 0.120.