All articles — korshunov.ai

All articles Page 1 / 108

What are companies actually using for self-hosted AI right now, and why?

A Reddit user is soliciting real-world data on enterprise deployments of self-hosted artificial intelligence, distinguishing actual production use from hobbyist testing.

media r/LocalLLaMA · 10h ago

Reddit post highlights biometric requirements for GPT 5.6 Sol preview

A Reddit user shared an image depicting a mock application interface requiring face scanning, fingerprint checking, and passport verification to join the GPT 5.6 Sol preview. The post characterizes these stringent identity verification steps as unusual or "wild" for accessing a model preview.

media r/LocalLLaMA · 10h ago

A barebones CPU-only inference engine for Qwen 3, written from scratch in pure C

A developer has released a pure C implementation of an inference engine specifically designed for Qwen 3 models of size 4B and below. The project is available on GitHub as a learning resource that prioritizes code readability and educational value over raw performance.

media r/LocalLLaMA · 10h ago

We're probably going to need that soon.

This Reddit post shares a meme featuring quotes from Vladik and Shaw on 𝕏 regarding future needs in the field.

media r/LocalLLaMA · 11h ago

Whisperian: Best Android App for Local ASR Models

Whisperian is an Android application that allows users to utilize microphone input with local Automatic Speech Recognition (ASR) models. The app is available for download on the Google Play Store.

github llama.cpp · 12h ago

llama.cpp b9829 Release: Reduced Logs and Multi-Platform Binaries

The llama.cpp project has released version b9829, which includes a reduction of logging output in the server, common components, and speculative decoding modules. This update also standardizes naming conventions by replacing CMN_ with COM_.

media r/LocalLLaMA · 13h ago

Reverse engineered DeepSeek Chat into an OpenAI compatible API

A developer has created a local proxy that reverse-engineers the free DeepSeek consumer web chat to expose an OpenAI-compatible API endpoint at localhost:8000/v1. This tool allows existing OpenAI-compatible clients, such as Open WebUI and various SDKs, to interact with DeepSeek's V4 and R1 models without code changes or API keys.

media r/LocalLLaMA · 13h ago

Qwen3-VL-2B excels at JSON extraction on low-end hardware

A user reports that Qwen3-VL-2B is the only viable vision-language model for reliably extracting data from images to JSON on low-spec devices like Intel i3 laptops with 8GB RAM. The author notes that despite its performance, the model is absent from major benchmarks such as Artificial Analysis and the Open LLM Leaderboard.

media r/LocalLLaMA · 15h ago

Clark Labs Releases Ternary-Quantized Sana 1.6B Text-to-Image Model

Clark Labs has released a compressed version of the Sana 1.6B text-to-image transformer, quantized to ternary weights at approximately 1.85 bits per weight. This compression results in a model that is 8.6 times smaller than the standard FP16 version while maintaining near-FP16 quality.

media Hugging Face Forums · 15h ago

User seeks collaborators for a new ML Sudoku dataset project

A user on the Hugging Face forums is seeking collaborators to build a machine learning and deep learning project focused on Sudokus. The author has begun creating a database from scratch and aims to establish an independent organization for this cause.

media r/LocalLLaMA · 16h ago

A Blind Visual Paradigm for Testing Skill Transfer in Small Models Without Fine-Tuning

The author proposes a cross-domain, blind visual experiment to determine if a large language model can compress its procedural planning into a reusable scaffold that enhances a small model's output without fine-tuning. Using Three.js as the testbed, the study aims to prove that this transfer of skill is genuine and not merely overfitting to the source domain.

media r/LocalLLaMA · 16h ago

User builds maxed-out local LLM rig with RTX Pro 5000 and Ryzen 9950X3D

A Reddit user shares the completion of a high-end local AI workstation featuring an NVIDIA RTX Pro 5000 GPU, AMD Ryzen 9 9950X3D CPU, 192GB RAM, and 80GB VRAM. The build was finalized after the user's application for the NVIDIA Inception program was rejected and prices for the RTX Pro 6000 exceeded their budget.

media r/LocalLLaMA · 16h ago

Tested which model can send best HTML email

A user recently deployed the Mailcue tool, which includes an MCP server for email management, and tested three specific models to determine which generates the most visually appealing HTML emails. The models evaluated were google/gemma-4-26b-a4b-qat, qwen/qwen3.6-35b-a3b, and qwen/qwen3.6-27b.

media r/LocalLLaMA · 17h ago

Reddit post: 10x Kaioken SSJ1 4th grade, worth it in 2026? Can it run Qwen3.6?

A Reddit user submitted an image titled "10x Kaioken SSJ1 4th grade, worth it in 2026? Can it run Qwen3.6?" to the r/LocalLLaMA community. The post includes a link to the original image and a link to the comments section for further discussion.

media r/LocalLLaMA · 17h ago

US Ban Benchmark Updated: GPT-5.6 Ties Anthropic

OpenAI's latest model ties with Anthropic in the US Ban benchmark following the preview of GPT-5.6.

media r/LocalLLaMA · 17h ago

Koboldcpp v1.116 released

The Koboldcpp project has released version 1.116, as announced on the LocalLLaMA subreddit and the official GitHub repository.

media r/LocalLLaMA · 17h ago

Blind-graded 55 LLMs: Same-family rating bias is statistically significant

An open evaluation involving 55 models from 11 developer families revealed that large language models exhibit statistically significant in-group bias when blind-grading each other. Across 22,254 valid judgments, every family with sufficient data showed a tendency to rate its own members differently than those of other families.

media r/LocalLLaMA · 17h ago

What are companies actually using for self-hosted AI right now, and why?

Reddit post highlights biometric requirements for GPT 5.6 Sol preview

A barebones CPU-only inference engine for Qwen 3, written from scratch in pure C

We're probably going to need that soon.

Whisperian: Best Android App for Local ASR Models

llama.cpp b9829 Release: Reduced Logs and Multi-Platform Binaries

Reverse engineered DeepSeek Chat into an OpenAI compatible API

Qwen3-VL-2B excels at JSON extraction on low-end hardware

Clark Labs Releases Ternary-Quantized Sana 1.6B Text-to-Image Model

User seeks collaborators for a new ML Sudoku dataset project

A Blind Visual Paradigm for Testing Skill Transfer in Small Models Without Fine-Tuning

User builds maxed-out local LLM rig with RTX Pro 5000 and Ryzen 9950X3D

Tested which model can send best HTML email

Reddit post: 10x Kaioken SSJ1 4th grade, worth it in 2026? Can it run Qwen3.6?

US Ban Benchmark Updated: GPT-5.6 Ties Anthropic

Koboldcpp v1.116 released

Blind-graded 55 LLMs: Same-family rating bias is statistically significant

User asks if 2x RX 9060xt 16GB is worth it for running Qwen 3.6 27B

Full document redaction with Qwen 3.6 27B with a Pi agent harness

claude_converter: Turn Claude Code sessions into fine-tuning data