All articles — korshunov.ai

All articles Page 1 / 107

Community inquiry on using Q1/Q2 quantization for large language models

A Reddit user asks the community about their experiences using Q1 or Q2 quantization levels for large language models ranging from 100 to 250 billion parameters. The post lists specific models in this size range, such as DeepSeek-V4-Flash and Qwen3-235B-A22B, and contrasts them with smaller models where lower quantization is generally discouraged.

github llama.cpp · 3h ago

llama.cpp b9830 release adds --offline flag and fixes memory bug

The llama.cpp b9830 release introduces the ability to use the --offline flag with the llama download command, allowing scripts to verify cached models without network access. This update also resolves a latent use-after-free vulnerability in the URL-task on_done callback where first_path was incorrectly captured by reference.

media Hugging Face Forums · 3h ago

User Requests Account Recovery for zhoucantd

A user on the Hugging Face forums is asking if it is possible to recover their account, specifically identifying the username "zhoucantd". The post indicates a discussion thread involving two participants regarding this request.

media Hugging Face Forums · 3h ago

UCTF: A Universal Compressed Training Format for Multilingual AI

A new concept called UCTF (Universal Compressed Training Format) proposes a mediator layer to address semantic redundancy in multilingual LLM training by compressing diverse languages into a unified, language-agnostic token format.

media Hugging Face Forums · 3h ago

Creating a Website Chat Widget with Gradio Part IV

A user reports that their previously functional AI chatbot widget on their website has stopped working due to a CORS policy error after a recent Gradio update. The error indicates that the 'Access-Control-Allow-Credentials' header in the response is empty, which conflicts with the client's request credentials mode.

media Hugging Face Forums · 3h ago

The language as carrier of intelligence: Beyond token prediction

This article argues that large language models derive their apparent intelligence from the deep geometric relationships and hidden states within language itself, rather than from independent mechanical computation or simple token prediction.

media r/LocalLLaMA · 4h ago

DuckDuckGo is blocking with a CAPTCHA. Let me try other approaches:

A user on the LocalLLaMA subreddit reports that their local llama.cpp-based LLM began encountering DuckDuckGo CAPTCHA blocks this morning. The article asks if other users are experiencing similar issues with DuckDuckGo's anti-bot measures.

media r/LocalLLaMA · 4h ago

What are companies actually using for self-hosted AI right now, and why?

A Reddit user is soliciting real-world data on enterprise deployments of self-hosted artificial intelligence, distinguishing actual production use from hobbyist testing.

media r/LocalLLaMA · 4h ago

Reddit post highlights biometric requirements for GPT 5.6 Sol preview

A Reddit user shared an image depicting a mock application interface requiring face scanning, fingerprint checking, and passport verification to join the GPT 5.6 Sol preview. The post characterizes these stringent identity verification steps as unusual or "wild" for accessing a model preview.

media r/LocalLLaMA · 5h ago

A barebones CPU-only inference engine for Qwen 3, written from scratch in pure C

A developer has released a pure C implementation of an inference engine specifically designed for Qwen 3 models of size 4B and below. The project is available on GitHub as a learning resource that prioritizes code readability and educational value over raw performance.

media r/LocalLLaMA · 5h ago

We're probably going to need that soon.

This Reddit post shares a meme featuring quotes from Vladik and Shaw on 𝕏 regarding future needs in the field.

media r/LocalLLaMA · 5h ago

Whisperian: Best Android App for Local ASR Models

Whisperian is an Android application that allows users to utilize microphone input with local Automatic Speech Recognition (ASR) models. The app is available for download on the Google Play Store.

github llama.cpp · 7h ago

llama.cpp b9829 Release: Reduced Logs and Multi-Platform Binaries

The llama.cpp project has released version b9829, which includes a reduction of logging output in the server, common components, and speculative decoding modules. This update also standardizes naming conventions by replacing CMN_ with COM_.

media r/LocalLLaMA · 7h ago

Reverse engineered DeepSeek Chat into an OpenAI compatible API

A developer has created a local proxy that reverse-engineers the free DeepSeek consumer web chat to expose an OpenAI-compatible API endpoint at localhost:8000/v1. This tool allows existing OpenAI-compatible clients, such as Open WebUI and various SDKs, to interact with DeepSeek's V4 and R1 models without code changes or API keys.

media r/LocalLLaMA · 8h ago

Qwen3-VL-2B excels at JSON extraction on low-end hardware

A user reports that Qwen3-VL-2B is the only viable vision-language model for reliably extracting data from images to JSON on low-spec devices like Intel i3 laptops with 8GB RAM. The author notes that despite its performance, the model is absent from major benchmarks such as Artificial Analysis and the Open LLM Leaderboard.

media r/LocalLLaMA · 9h ago

Clark Labs Releases Ternary-Quantized Sana 1.6B Text-to-Image Model

Clark Labs has released a compressed version of the Sana 1.6B text-to-image transformer, quantized to ternary weights at approximately 1.85 bits per weight. This compression results in a model that is 8.6 times smaller than the standard FP16 version while maintaining near-FP16 quality.

media Hugging Face Forums · 10h ago

User seeks collaborators for a new ML Sudoku dataset project

A user on the Hugging Face forums is seeking collaborators to build a machine learning and deep learning project focused on Sudokus. The author has begun creating a database from scratch and aims to establish an independent organization for this cause.

media r/LocalLLaMA · 10h ago

A Blind Visual Paradigm for Testing Skill Transfer in Small Models Without Fine-Tuning

The author proposes a cross-domain, blind visual experiment to determine if a large language model can compress its procedural planning into a reusable scaffold that enhances a small model's output without fine-tuning. Using Three.js as the testbed, the study aims to prove that this transfer of skill is genuine and not merely overfitting to the source domain.

media r/LocalLLaMA · 10h ago

User builds maxed-out local LLM rig with RTX Pro 5000 and Ryzen 9950X3D

A Reddit user shares the completion of a high-end local AI workstation featuring an NVIDIA RTX Pro 5000 GPU, AMD Ryzen 9 9950X3D CPU, 192GB RAM, and 80GB VRAM. The build was finalized after the user's application for the NVIDIA Inception program was rejected and prices for the RTX Pro 6000 exceeded their budget.

media r/LocalLLaMA · 10h ago

Tested which model can send best HTML email

A user recently deployed the Mailcue tool, which includes an MCP server for email management, and tested three specific models to determine which generates the most visually appealing HTML emails. The models evaluated were google/gemma-4-26b-a4b-qat, qwen/qwen3.6-35b-a3b, and qwen/qwen3.6-27b.