All articles — korshunov.ai

All articles Page 1 / 107

Quantization Impact on MTP Draft Acceptance Rates

An analysis of speculative decoding using Gemma 4-31B-it models demonstrates that heavy quantization reduces the token acceptance rate because the main model becomes less consistent with the drafter. Testing across Q5_K_S, IQ4_XS, IQ3_M, and IQ2_M quantizations reveals how draft depth affects performance.

media r/LocalLLaMA · 9h ago

Running GLM5.2 on budget hardware < $2500

A Reddit user demonstrates how to assemble a local AI inference rig for under $2500 using affordable second-hand components, specifically targeting the ability to run large language models like GLM-5.2 without expensive enterprise hardware.

media r/LocalLLaMA · 9h ago

User reports Ornith 35B outperforms Qwen in 3D game generation

A Reddit user shares their experience using the Claude Code harness to generate a 3D game with the Ornith 35B model. After three prompts, the model successfully produced the requested output, whereas the Qwen3.5-35b-a3b model failed to do so even after multiple attempts.

media r/LocalLLaMA · 9h ago

Observations on the decline of fine-tuning discussions for consumer hardware

A Reddit user notes that interest in fine-tuning models on consumer-grade hardware appears to have decreased since the release of capable generalist models like Llama-3-8b. The author suggests that improved base model intelligence reduces the necessity for fine-tuning, as prompt engineering often suffices.

media r/LocalLLaMA · 9h ago

Google runs hackathons for small models like Gemma 4 31B

Google is organizing hackathons focused on small language models, specifically the Gemma 4 31B, to demonstrate their value in AI-assisted software engineering. This initiative highlights the company's continued belief in the utility of smaller models despite the industry trend toward larger ones.

media r/LocalLLaMA · 9h ago

Mythos was the first, now GPT-5.6

The provided text is a Reddit post discussing OpenAI's GPT-5.6 model and its rollout limitations following a government request.

media r/LocalLLaMA · 9h ago

Welp ... I bought my Wife a Diet Pepsi.

A Reddit user in the r/LocalLLaMA community shared an image with the caption "Happy wife happy life as they say." The post is a personal anecdote about purchasing a Diet Pepsi for the user's wife.

media r/LocalLLaMA · 10h ago

ObviousBench: A Benchmark for Visible LLM Failures in Smaller Models

ObviousBench is a new benchmark designed to evaluate visible failures in large language models, focusing on how configuration choices impact error rates. The tool highlights the trade-offs between model size, speed, and reasoning capabilities rather than just ranking performance.

media r/LocalLLaMA · 10h ago

Cory Doctorow Interview on AI and Local AI Advocacy

This Reddit post shares an Ars Technica interview with Cory Doctorow regarding his thoughts on artificial intelligence. The original poster highlights the article's critical stance on major tech companies attempting to go public.

media r/LocalLLaMA · 10h ago

SupraLabs Releases SupraSafety-18M, a Tiny Content-Moderation Model

SupraLabs has released SupraSafety-18M, a BERT-style binary text classifier with 18 million parameters designed for content moderation on edge devices and mobile phones. The model was trained from scratch on the nvidia/Nemotron-3.5-Content-Safety-Dataset and achieves an accuracy of 81.2% and precision of 86.9%.

media r/LocalLLaMA · 10h ago

GPU Lab Operator Warns Against 96GB 4090 and 5090 Pre-orders

A GPU lab operator in the USA who collaborates with Chinese factories to produce modified 48GB RTX 4090 PCBs warns that listings for 96GB RTX 4090s and RTX 5090s are scams as of June 2026.

media r/LocalLLaMA · 10h ago

Offline GPU Build Picker Estimates Local Model Fit and Speed

A developer has released an offline, single-file HTML tool that estimates which local large language models will fit on a specific GPU configuration and predicts their token generation speed. The tool is designed to answer the common question of whether a custom PC build can run desired models effectively, without requiring a backend or user account.

media r/LocalLLaMA · 10h ago

Reddit user asks for updates on agent browser use frameworks and local model capabilities

A Reddit user inquires about the current state of agent browser use frameworks, specifically asking if improvements have been made to handle long workflows compared to previous experiences.

media r/LocalLLaMA · 10h ago

User seeks advice for running local LLMs on low-spec hardware

A Reddit user is asking for recommendations to run small local language models and potentially agentic tasks like Hermes on an old MacBook Pro with limited resources.

media r/LocalLLaMA · 10h ago

SpectralQuant Qwen3.5 0.8B Q4_K_M recovers 96.5% of BF16 gap

Spectral Labs has released a release candidate for a calibration-aware Q4_K_M quantization of the Qwen3.5 0.8B model, utilizing a new method called SpectralQuant. This approach aims to make standard Q4_K_M footprints behave more like larger quant formats while maintaining compatibility with llama.cpp.

media Ahead of AI · 11h ago

Setting Up a Local Coding Agent with Open-Source Tools

This article provides a tutorial on configuring a production-ready, fully local coding agent stack using open-source tools and open-weight large language models. It details how to combine a locally served LLM with a coding harness capable of reading files, making edits, running commands, and verifying changes.

media r/LocalLLaMA · 11h ago

Orthrus diffusion head trained Qwen 3.5/3.6 and Gemma 4 models dropping soon

The Orthrus project is preparing to release support for Qwen 3.5, Qwen 3.6, and Gemma 4 models using a diffusion head approach. The team has finalized testing and is currently setting up the release pipeline.

media r/LocalLLaMA · 11h ago

Reddit user spots new vision mode in DeepSeek app

A Reddit user observed a new vision mode within the DeepSeek application, prompting speculation about an upcoming vision model release. The user clarified that the feature is not an OCR tool, as it successfully described images containing no text.

media r/LocalLLaMA · 11h ago

Reports of 96GB VRAM RTX 5090s from Shenzhen's Huaqiangbei

Visitors to Shenzhen's Huaqiangbei electronics market have encountered reports and potential offers for modified Nvidia RTX 5090 graphics cards equipped with 96 gigabytes of video RAM. One seller indicated that such a hacked-up Blackwell RTX 6000 would cost approximately $8,200, comprising 36,000 yuan for the base card and an additional 20,000 yuan for the memory upgrade.

media r/LocalLLaMA · 11h ago

User asks for better coding models for single DGX Spark

A Reddit user with a single DGX Spark featuring 128 GB of unified memory is seeking recommendations for improved coding models, currently using StepFun step-3.7-flash and Qwen 3.6 variants.