All articles — korshunov.ai

All articles Page 1 / 108

What would it take to create /r/localllama's own LLM?

A Reddit user expresses concern over the potential loss of access to open weights for 96GB to 128GB hardware and questions whether a community-driven Large Language Model is feasible.

media r/LocalLLaMA · 3h ago

Sell ddr5 for vram?

A Reddit user asks whether they should sell half of their 768GB DDR5 6400 ECC RAM to purchase RTX 6000 Pro GPUs, citing current RAM prices.

media r/LocalLLaMA · 3h ago

Seeking advice on cases for dual RTX 3090 LLM workstation

A user is building a local LLM workstation using an ASUS Crosshair VIII Hero motherboard and two power-limited RTX 3090 GPUs, seeking recommendations for compatible computer cases.

media r/LocalLLaMA · 3h ago

Qwen3.6 27B local vs Opus 4.8, voxel engine in raw C with zero frameworks

A comparison experiment pitted Claude Code on Opus 4.8 against a locally running Qwen3.6 27B model to build a voxel world engine in plain C without any external frameworks or libraries.

media r/LocalLLaMA · 3h ago

User questions existence of closed vs open LLM rankings and value of 70B-350B models

A Reddit user asks whether a solid leaderboard exists that compares closed-source and open-weight large language models side by side. They note that most available benchmarks feel fragmented and fail to address the practical differences between running models locally versus using API-based services.

media r/LocalLLaMA · 4h ago

Community inquiry on using Q1/Q2 quantization for large language models

A Reddit user asks the community about their experiences using Q1 or Q2 quantization levels for large language models ranging from 100 to 250 billion parameters. The post lists specific models in this size range, such as DeepSeek-V4-Flash and Qwen3-235B-A22B, and contrasts them with smaller models where lower quantization is generally discouraged.

github llama.cpp · 4h ago

llama.cpp b9830 release adds --offline flag and fixes memory bug

The llama.cpp b9830 release introduces the ability to use the --offline flag with the llama download command, allowing scripts to verify cached models without network access. This update also resolves a latent use-after-free vulnerability in the URL-task on_done callback where first_path was incorrectly captured by reference.

media Hugging Face Forums · 4h ago

User Requests Account Recovery for zhoucantd

A user on the Hugging Face forums is asking if it is possible to recover their account, specifically identifying the username "zhoucantd". The post indicates a discussion thread involving two participants regarding this request.

media Hugging Face Forums · 4h ago

UCTF: A Universal Compressed Training Format for Multilingual AI

A new concept called UCTF (Universal Compressed Training Format) proposes a mediator layer to address semantic redundancy in multilingual LLM training by compressing diverse languages into a unified, language-agnostic token format.

media Hugging Face Forums · 4h ago

Creating a Website Chat Widget with Gradio Part IV

A user reports that their previously functional AI chatbot widget on their website has stopped working due to a CORS policy error after a recent Gradio update. The error indicates that the 'Access-Control-Allow-Credentials' header in the response is empty, which conflicts with the client's request credentials mode.

media Hugging Face Forums · 4h ago

The language as carrier of intelligence: Beyond token prediction

This article argues that large language models derive their apparent intelligence from the deep geometric relationships and hidden states within language itself, rather than from independent mechanical computation or simple token prediction.

media r/LocalLLaMA · 5h ago

DuckDuckGo is blocking with a CAPTCHA. Let me try other approaches:

A user on the LocalLLaMA subreddit reports that their local llama.cpp-based LLM began encountering DuckDuckGo CAPTCHA blocks this morning. The article asks if other users are experiencing similar issues with DuckDuckGo's anti-bot measures.

media r/LocalLLaMA · 5h ago

What are companies actually using for self-hosted AI right now, and why?

A Reddit user is soliciting real-world data on enterprise deployments of self-hosted artificial intelligence, distinguishing actual production use from hobbyist testing.

media r/LocalLLaMA · 5h ago

Reddit post highlights biometric requirements for GPT 5.6 Sol preview

A Reddit user shared an image depicting a mock application interface requiring face scanning, fingerprint checking, and passport verification to join the GPT 5.6 Sol preview. The post characterizes these stringent identity verification steps as unusual or "wild" for accessing a model preview.

media r/LocalLLaMA · 6h ago

A barebones CPU-only inference engine for Qwen 3, written from scratch in pure C

A developer has released a pure C implementation of an inference engine specifically designed for Qwen 3 models of size 4B and below. The project is available on GitHub as a learning resource that prioritizes code readability and educational value over raw performance.

media r/LocalLLaMA · 6h ago

We're probably going to need that soon.

This Reddit post shares a meme featuring quotes from Vladik and Shaw on 𝕏 regarding future needs in the field.

media r/LocalLLaMA · 6h ago

Whisperian: Best Android App for Local ASR Models

Whisperian is an Android application that allows users to utilize microphone input with local Automatic Speech Recognition (ASR) models. The app is available for download on the Google Play Store.

github llama.cpp · 8h ago

llama.cpp b9829 Release: Reduced Logs and Multi-Platform Binaries

The llama.cpp project has released version b9829, which includes a reduction of logging output in the server, common components, and speculative decoding modules. This update also standardizes naming conventions by replacing CMN_ with COM_.

media r/LocalLLaMA · 8h ago

Reverse engineered DeepSeek Chat into an OpenAI compatible API

A developer has created a local proxy that reverse-engineers the free DeepSeek consumer web chat to expose an OpenAI-compatible API endpoint at localhost:8000/v1. This tool allows existing OpenAI-compatible clients, such as Open WebUI and various SDKs, to interact with DeepSeek's V4 and R1 models without code changes or API keys.

media r/LocalLLaMA · 9h ago

Qwen3-VL-2B excels at JSON extraction on low-end hardware

A user reports that Qwen3-VL-2B is the only viable vision-language model for reliably extracting data from images to JSON on low-spec devices like Intel i3 laptops with 8GB RAM. The author notes that despite its performance, the model is absent from major benchmarks such as Artificial Analysis and the Open LLM Leaderboard.