All articles — korshunov.ai

All articles Page 1 / 129

Best local LLM for English story summarization

A user asks which local LLM currently performs best at summarizing long English stories. The query highlights the need for accurate, local LLMs capable of handling multi-page narratives in English.

media r/LocalLLaMA · 11d ago

GLM 5.2 UD IQ2_M produces best pelican SVG image ever seen

A user shares an image generated by the GLM 5.2 UD IQ2_M model, calling it the best pelican SVG image they have ever seen. Despite low quantization, the model demonstrates strong capabilities, with the user noting its potential to perform significantly better on future high-end hardware setups.

github llama.cpp · 11d ago

ggml optimizes AMX with partition flattening

The ggml project has optimized AMX performance by flattening the partition over n_batch * M, ensuring all threads participate in quantization. This change improves speed by up to 1.47x across various models and hardware configurations on CPU and GPU platforms, with results showing consistent gains in inference time.

github llama.cpp · 11d ago

GLM-5.2 DSA indexer fix: tensors marked not required

The GLM-5.2 model's DSA indexer was incorrectly loaded on all layers, causing failures due to missing tensors. The update marks indexer tensors as TENSOR_NOT_REQUIRED, allowing layers without an indexer to load as nullptr and enabling full MLA attention. DeepSeek-V3.2, with uniform indexing, is unaffected.

github llama.cpp · 11d ago

Docker prebuild web UI for s390x

A pull request has been submitted to add a prebuilt web UI for s390x architecture in Docker. The change is currently pending release and has not been published yet.

media r/LocalLLaMA · 11d ago

Worlds Biggest Chat Title Dataset Released by SupraLabs

SupraLabs has released a curated chat title dataset with 115K samples, surpassing the previous record of 10K samples. The filtered dataset is available as `SupraLabs/chat-titles-filtered-115K`, while an unfiltered version with 150K samples is also provided, along with a legacy 12K dataset.

media Latent Space · 11d ago

Latent Space Subscribers Get $250 Discount for AIE WF 2026

Latent Space subscribers receive a limited-time $250 discount on AIE WF 2026 tickets. Attendees also receive $40k in sponsor credits from companies like Warp, Datadog, SourceGraph, Stripe, and Fireworks.

media r/LocalLLaMA · 11d ago

Best Settings for 48GB VRAM with Qwen 3.6 27B

A user shares optimized settings for running Qwen 3.6 27B with Q8_0 quantization on an RTX 4090 and RTX 3090 setup using llama.cpp. The configuration includes tensor split, 999 layers on GPU, 250k context, speculative decoding, and unified KV cache, achieving 75-100t/s throughput with vision and MTP support.

media r/LocalLLaMA · 11d ago

Help with a Local Document RAG System (Storage + Ingestion + Query + Highlighting)

A user is designing a local, offline document retrieval and LLM pipeline with storage, ingestion, query, and highlighting features. They seek advice on vector databases (e.g., pgvector in Postgres vs Qdrant), GraphRAG feasibility offline, and open-source tools for document highlighting with citations.

media r/LocalLLaMA · 11d ago

7900XTX 24GB VRAM Runs Qwen 3.6 27B with 131k Context

A user reports successfully running a Qwen 3.6 27B model with Q6K+MTP quantization and 131k context length on a 7900XTX with 24GB VRAM. This is achieved using kvcache quantization (Q5_0/Q4_0), which reduces VRAM usage by 12% compared to Q8, enabling the model to run at 55-60 tokens per second with specific compile flags and llama.cpp arguments.

media r/LocalLLaMA · 11d ago

GLM 5.2 Achieves 98% Max Intelligence with Less Than Half Tokens

GLM 5.2 demonstrates 98% of maximum intelligence in coding tasks using less than half of its total token budget, according to a technical report by z_ai. The model's reasoning efficiency has improved significantly, with token usage increasing from 16.7k to 36.7k between GLM 5.1 and GLM 5.2, though high-level settings may strain local hardware performance.

media r/LocalLLaMA · 11d ago

AMD Future GPU Offerings for LLM Builds

AMD has announced upcoming GPU offerings that could support local large language model (LLM) deployments. These GPUs are designed with enhanced memory bandwidth and compute capabilities, making them suitable for efficient LLM inference and training in dedicated local rigs.

media r/LocalLLaMA · 11d ago

llama.cpp B70 SYCL Benchmarks Results

Benchmarks show llama.cpp B70 with SYCL backend performs well on models like gemma4 12B and 26B, achieving throughput of up to 5662.45 t/s for the E2B model. Performance drops significantly in tg128 mode, with qwen35 27B reaching only 15.42 t/s, indicating room for optimization.

media r/LocalLLaMA · 11d ago

Local AI for Local Office Files

A Reddit user asks which AI agent is best for handling local office files like Excel, PDF, Word, and JSON. The post seeks user experiences and implemented workflows for such tasks.

media r/LocalLLaMA · 11d ago

Tool calling issue in open-source Qwen3.6 27B 8K

Users report that the Qwen3.6 27B 8K model occasionally stops processing after generating a tool call, especially when the user steps away. The issue can be resolved by manually pasting the tool call back into the prompt, allowing the model to resume execution. The tool call involves a bash function to find passing tests in a codebase.

media r/LocalLLaMA · 11d ago

What is the best book for learning ML/Deep Learning maths?

A user asks for book recommendations to build a strong mathematical foundation for understanding and contributing to machine learning and deep learning, especially given their interest in AI architectures and large language models. They acknowledge that intuitive understanding is limited without proper mathematical background and seek structured resources to complement their current learning through channels like 3b1b.

github Open Interpreter · 11d ago

Rust Release 0.0.15

Rust version 0.0.15 has been released. This early version is part of Rust's initial development phase and includes foundational features for the language.

github Open Interpreter · 11d ago

Open Interpreter 0.0.16 Released

Open Interpreter has released version 0.0.16. The update introduces new features and improvements to its core functionality, enhancing user interaction and task execution capabilities.

github Open Interpreter · 11d ago

Open Interpreter 0.0.17 Released

Open Interpreter has released version 0.0.17. The update introduces new features and improvements to its core functionality, enhancing user interaction and task execution capabilities.

media r/LocalLLaMA · 11d ago

Local Agent Web Access via SearXNG and Scrapling

A local agent can access the web without paid APIs by using self-hosted SearXNG for search and Scrapling with Trafilatura for page extraction. The setup avoids vendor dependencies, uses open-source tools, and delivers search results and page content in Markdown format, with fallbacks for CAPTCHAs and security challenges.