All articles — korshunov.ai

All articles Page 1 / 107

Whisperian: Best Android App for Local ASR Models

Whisperian is an Android application that allows users to utilize microphone input with local Automatic Speech Recognition (ASR) models. The app is available for download on the Google Play Store.

github llama.cpp · 4h ago

llama.cpp b9829 Release: Reduced Logs and Multi-Platform Binaries

The llama.cpp project has released version b9829, which includes a reduction of logging output in the server, common components, and speculative decoding modules. This update also standardizes naming conventions by replacing CMN_ with COM_.

media r/LocalLLaMA · 5h ago

Reverse engineered DeepSeek Chat into an OpenAI compatible API

A developer has created a local proxy that reverse-engineers the free DeepSeek consumer web chat to expose an OpenAI-compatible API endpoint at localhost:8000/v1. This tool allows existing OpenAI-compatible clients, such as Open WebUI and various SDKs, to interact with DeepSeek's V4 and R1 models without code changes or API keys.

media r/LocalLLaMA · 5h ago

Qwen3-VL-2B excels at JSON extraction on low-end hardware

A user reports that Qwen3-VL-2B is the only viable vision-language model for reliably extracting data from images to JSON on low-spec devices like Intel i3 laptops with 8GB RAM. The author notes that despite its performance, the model is absent from major benchmarks such as Artificial Analysis and the Open LLM Leaderboard.

media r/LocalLLaMA · 7h ago

Clark Labs Releases Ternary-Quantized Sana 1.6B Text-to-Image Model

Clark Labs has released a compressed version of the Sana 1.6B text-to-image transformer, quantized to ternary weights at approximately 1.85 bits per weight. This compression results in a model that is 8.6 times smaller than the standard FP16 version while maintaining near-FP16 quality.

media Hugging Face Forums · 7h ago

User seeks collaborators for a new ML Sudoku dataset project

A user on the Hugging Face forums is seeking collaborators to build a machine learning and deep learning project focused on Sudokus. The author has begun creating a database from scratch and aims to establish an independent organization for this cause.

media r/LocalLLaMA · 8h ago

A Blind Visual Paradigm for Testing Skill Transfer in Small Models Without Fine-Tuning

The author proposes a cross-domain, blind visual experiment to determine if a large language model can compress its procedural planning into a reusable scaffold that enhances a small model's output without fine-tuning. Using Three.js as the testbed, the study aims to prove that this transfer of skill is genuine and not merely overfitting to the source domain.

media r/LocalLLaMA · 8h ago

User builds maxed-out local LLM rig with RTX Pro 5000 and Ryzen 9950X3D

A Reddit user shares the completion of a high-end local AI workstation featuring an NVIDIA RTX Pro 5000 GPU, AMD Ryzen 9 9950X3D CPU, 192GB RAM, and 80GB VRAM. The build was finalized after the user's application for the NVIDIA Inception program was rejected and prices for the RTX Pro 6000 exceeded their budget.

media r/LocalLLaMA · 8h ago

Tested which model can send best HTML email

A user recently deployed the Mailcue tool, which includes an MCP server for email management, and tested three specific models to determine which generates the most visually appealing HTML emails. The models evaluated were google/gemma-4-26b-a4b-qat, qwen/qwen3.6-35b-a3b, and qwen/qwen3.6-27b.

media r/LocalLLaMA · 9h ago

Reddit post: 10x Kaioken SSJ1 4th grade, worth it in 2026? Can it run Qwen3.6?

A Reddit user submitted an image titled "10x Kaioken SSJ1 4th grade, worth it in 2026? Can it run Qwen3.6?" to the r/LocalLLaMA community. The post includes a link to the original image and a link to the comments section for further discussion.

media r/LocalLLaMA · 9h ago

US Ban Benchmark Updated: GPT-5.6 Ties Anthropic

OpenAI's latest model ties with Anthropic in the US Ban benchmark following the preview of GPT-5.6.

media r/LocalLLaMA · 9h ago

Koboldcpp v1.116 released

The Koboldcpp project has released version 1.116, as announced on the LocalLLaMA subreddit and the official GitHub repository.

media r/LocalLLaMA · 9h ago

Blind-graded 55 LLMs: Same-family rating bias is statistically significant

An open evaluation involving 55 models from 11 developer families revealed that large language models exhibit statistically significant in-group bias when blind-grading each other. Across 22,254 valid judgments, every family with sufficient data showed a tendency to rate its own members differently than those of other families.

media r/LocalLLaMA · 9h ago

User asks if 2x RX 9060xt 16GB is worth it for running Qwen 3.6 27B

A user on Reddit inquires whether purchasing two AMD Radeon RX 9060 XT graphics cards with 16GB of VRAM each is a worthwhile investment for running the Qwen 3.6 27B model and similar architectures.

media r/LocalLLaMA · 9h ago

Full document redaction with Qwen 3.6 27B with a Pi agent harness

The author demonstrates that local models, specifically Qwen 3.6 27B, can perform end-to-end document redaction when optimized with a higher quantization level and an agentic harness using the PI framework.

media r/LocalLLaMA · 9h ago

claude_converter: Turn Claude Code sessions into fine-tuning data

The author developed `claude_converter`, a tool that converts local Claude Code `.jsonl` session files into formats compatible with fine-tuning frameworks like TRL, Axolotl, and LLaMA-Factory.

media r/LocalLLaMA · 9h ago

Will Chinese Open Source Models be the only option soon?

A Reddit user argues that US tech companies seek total global control over AI and view the release of advanced models as a threat to that dominance.

media r/LocalLLaMA · 9h ago

Model Registry: Torrents for open models using Hugging Face as a fallback web seed.

A new repository and site called Model Registry has been created to publish and share .torrent files for popular open models, utilizing Hugging Face as a fallback web seed. The project includes scripts to automate the process and a backend service that redirects BitTorrent clients to the correct Hugging Face endpoint.

media r/LocalLLaMA · 10h ago

Home Lab: 4x Modded 4090s for Local LLM Inference

A user details a high-performance local inference setup utilizing four modified NVIDIA RTX 4090 GPUs with 192GB of VRAM, paired with a WRX90E-SAGE SE motherboard and 3000W power supply.

media r/LocalLLaMA · 10h ago

Could AI game upscalers benefit from lightweight game-specific adapters?

A Reddit user proposes that AI upscaling technologies like DLSS and FSR could utilize lightweight, game-specific adapter layers to improve performance on low-power hardware.