All articles — korshunov.ai

All articles Page 1 / 108

Proposal for crowd-sourced, open-source distilled LLMs via distributed training

A Reddit user proposes a system to create truly open-source distilled large language models by wrapping existing command-line AI services. This approach would collect user inputs and outputs from applications like coding assistants or chatbots to build massive datasets through volunteer participation.

media r/LocalLLaMA · 3h ago

DeepSpec: A DeepSeek AI Collection for Speculative Decoding Draft Models

DeepSpec is a full-stack codebase released by deepseek-ai for training and evaluating draft models used in speculative decoding. The project provides data preparation utilities, implementation code, and evaluation scripts to facilitate the development of these auxiliary models.

github llama.cpp · 4h ago

llama.cpp b9831 release adds DFlash support and new binaries

The llama.cpp b9831 release introduces DFlash v2 support, including sliding window attention per layer types, alongside a comprehensive set of pre-built binaries for multiple platforms.

media r/LocalLLaMA · 4h ago

DFlash support merged into llama.cpp

Support for the DFlash format has been merged into the llama.cpp repository. This update enables users to utilize DFlash files within the framework.

media r/LocalLLaMA · 5h ago

Step-3.7-Flash (198B-A11B vision MoE) on 4×3090 — fully-resident IQ3_XXS beats thespilled IQ4 by 2.4×, and MTP speculative decode silently breaks vision

A user demonstrates running StepFun's 198B-parameter Step-3.7-Flash model on a consumer 4×RTX 3090 setup, revealing critical performance trade-offs between quantization levels and multi-token prediction (MTP) with vision capabilities.

media r/LocalLLaMA · 5h ago

What would it take to create /r/localllama's own LLM?

A Reddit user expresses concern over the potential loss of access to open weights for 96GB to 128GB hardware and questions whether a community-driven Large Language Model is feasible.

media r/LocalLLaMA · 6h ago

Sell ddr5 for vram?

A Reddit user asks whether they should sell half of their 768GB DDR5 6400 ECC RAM to purchase RTX 6000 Pro GPUs, citing current RAM prices.

media r/LocalLLaMA · 6h ago

Seeking advice on cases for dual RTX 3090 LLM workstation

A user is building a local LLM workstation using an ASUS Crosshair VIII Hero motherboard and two power-limited RTX 3090 GPUs, seeking recommendations for compatible computer cases.

media r/LocalLLaMA · 6h ago

Qwen3.6 27B local vs Opus 4.8, voxel engine in raw C with zero frameworks

A comparison experiment pitted Claude Code on Opus 4.8 against a locally running Qwen3.6 27B model to build a voxel world engine in plain C without any external frameworks or libraries.

media r/LocalLLaMA · 6h ago

User questions existence of closed vs open LLM rankings and value of 70B-350B models

A Reddit user asks whether a solid leaderboard exists that compares closed-source and open-weight large language models side by side. They note that most available benchmarks feel fragmented and fail to address the practical differences between running models locally versus using API-based services.

media r/LocalLLaMA · 6h ago

Community inquiry on using Q1/Q2 quantization for large language models

A Reddit user asks the community about their experiences using Q1 or Q2 quantization levels for large language models ranging from 100 to 250 billion parameters. The post lists specific models in this size range, such as DeepSeek-V4-Flash and Qwen3-235B-A22B, and contrasts them with smaller models where lower quantization is generally discouraged.

github llama.cpp · 6h ago

llama.cpp b9830 release adds --offline flag and fixes memory bug

The llama.cpp b9830 release introduces the ability to use the --offline flag with the llama download command, allowing scripts to verify cached models without network access. This update also resolves a latent use-after-free vulnerability in the URL-task on_done callback where first_path was incorrectly captured by reference.

media Hugging Face Forums · 7h ago

User Requests Account Recovery for zhoucantd

A user on the Hugging Face forums is asking if it is possible to recover their account, specifically identifying the username "zhoucantd". The post indicates a discussion thread involving two participants regarding this request.

media Hugging Face Forums · 7h ago

UCTF: A Universal Compressed Training Format for Multilingual AI

A new concept called UCTF (Universal Compressed Training Format) proposes a mediator layer to address semantic redundancy in multilingual LLM training by compressing diverse languages into a unified, language-agnostic token format.

media Hugging Face Forums · 7h ago

Creating a Website Chat Widget with Gradio Part IV

A user reports that their previously functional AI chatbot widget on their website has stopped working due to a CORS policy error after a recent Gradio update. The error indicates that the 'Access-Control-Allow-Credentials' header in the response is empty, which conflicts with the client's request credentials mode.

media Hugging Face Forums · 7h ago

The language as carrier of intelligence: Beyond token prediction

This article argues that large language models derive their apparent intelligence from the deep geometric relationships and hidden states within language itself, rather than from independent mechanical computation or simple token prediction.

media r/LocalLLaMA · 7h ago

DuckDuckGo is blocking with a CAPTCHA. Let me try other approaches:

A user on the LocalLLaMA subreddit reports that their local llama.cpp-based LLM began encountering DuckDuckGo CAPTCHA blocks this morning. The article asks if other users are experiencing similar issues with DuckDuckGo's anti-bot measures.

media r/LocalLLaMA · 7h ago

What are companies actually using for self-hosted AI right now, and why?

A Reddit user is soliciting real-world data on enterprise deployments of self-hosted artificial intelligence, distinguishing actual production use from hobbyist testing.

media r/LocalLLaMA · 7h ago

Reddit post highlights biometric requirements for GPT 5.6 Sol preview

A Reddit user shared an image depicting a mock application interface requiring face scanning, fingerprint checking, and passport verification to join the GPT 5.6 Sol preview. The post characterizes these stringent identity verification steps as unusual or "wild" for accessing a model preview.

media r/LocalLLaMA · 8h ago

A barebones CPU-only inference engine for Qwen 3, written from scratch in pure C

A developer has released a pure C implementation of an inference engine specifically designed for Qwen 3 models of size 4B and below. The project is available on GitHub as a learning resource that prioritizes code readability and educational value over raw performance.