All articles — korshunov.ai

All articles Page 1 / 107

Qwen3-VL-2B excels at JSON extraction on low-end hardware

A user reports that Qwen3-VL-2B is the only viable vision-language model for reliably extracting data from images to JSON on low-spec devices like Intel i3 laptops with 8GB RAM. The author notes that despite its performance, the model is absent from major benchmarks such as Artificial Analysis and the Open LLM Leaderboard.

media r/LocalLLaMA · 4h ago

Clark Labs Releases Ternary-Quantized Sana 1.6B Text-to-Image Model

Clark Labs has released a compressed version of the Sana 1.6B text-to-image transformer, quantized to ternary weights at approximately 1.85 bits per weight. This compression results in a model that is 8.6 times smaller than the standard FP16 version while maintaining near-FP16 quality.

media Hugging Face Forums · 5h ago

User seeks collaborators for a new ML Sudoku dataset project

A user on the Hugging Face forums is seeking collaborators to build a machine learning and deep learning project focused on Sudokus. The author has begun creating a database from scratch and aims to establish an independent organization for this cause.

media r/LocalLLaMA · 5h ago

A Blind Visual Paradigm for Testing Skill Transfer in Small Models Without Fine-Tuning

The author proposes a cross-domain, blind visual experiment to determine if a large language model can compress its procedural planning into a reusable scaffold that enhances a small model's output without fine-tuning. Using Three.js as the testbed, the study aims to prove that this transfer of skill is genuine and not merely overfitting to the source domain.

media r/LocalLLaMA · 5h ago

User builds maxed-out local LLM rig with RTX Pro 5000 and Ryzen 9950X3D

A Reddit user shares the completion of a high-end local AI workstation featuring an NVIDIA RTX Pro 5000 GPU, AMD Ryzen 9 9950X3D CPU, 192GB RAM, and 80GB VRAM. The build was finalized after the user's application for the NVIDIA Inception program was rejected and prices for the RTX Pro 6000 exceeded their budget.

media r/LocalLLaMA · 5h ago

Tested which model can send best HTML email

A user recently deployed the Mailcue tool, which includes an MCP server for email management, and tested three specific models to determine which generates the most visually appealing HTML emails. The models evaluated were google/gemma-4-26b-a4b-qat, qwen/qwen3.6-35b-a3b, and qwen/qwen3.6-27b.

media r/LocalLLaMA · 7h ago

Reddit post: 10x Kaioken SSJ1 4th grade, worth it in 2026? Can it run Qwen3.6?

A Reddit user submitted an image titled "10x Kaioken SSJ1 4th grade, worth it in 2026? Can it run Qwen3.6?" to the r/LocalLLaMA community. The post includes a link to the original image and a link to the comments section for further discussion.

media r/LocalLLaMA · 7h ago

US Ban Benchmark Updated: GPT-5.6 Ties Anthropic

OpenAI's latest model ties with Anthropic in the US Ban benchmark following the preview of GPT-5.6.

media r/LocalLLaMA · 7h ago

Koboldcpp v1.116 released

The Koboldcpp project has released version 1.116, as announced on the LocalLLaMA subreddit and the official GitHub repository.

media r/LocalLLaMA · 7h ago

Blind-graded 55 LLMs: Same-family rating bias is statistically significant

An open evaluation involving 55 models from 11 developer families revealed that large language models exhibit statistically significant in-group bias when blind-grading each other. Across 22,254 valid judgments, every family with sufficient data showed a tendency to rate its own members differently than those of other families.

media r/LocalLLaMA · 7h ago

User asks if 2x RX 9060xt 16GB is worth it for running Qwen 3.6 27B

A user on Reddit inquires whether purchasing two AMD Radeon RX 9060 XT graphics cards with 16GB of VRAM each is a worthwhile investment for running the Qwen 3.6 27B model and similar architectures.

media r/LocalLLaMA · 7h ago

Full document redaction with Qwen 3.6 27B with a Pi agent harness

The author demonstrates that local models, specifically Qwen 3.6 27B, can perform end-to-end document redaction when optimized with a higher quantization level and an agentic harness using the PI framework.

media r/LocalLLaMA · 7h ago

claude_converter: Turn Claude Code sessions into fine-tuning data

The author developed `claude_converter`, a tool that converts local Claude Code `.jsonl` session files into formats compatible with fine-tuning frameworks like TRL, Axolotl, and LLaMA-Factory.

media r/LocalLLaMA · 7h ago

Will Chinese Open Source Models be the only option soon?

A Reddit user argues that US tech companies seek total global control over AI and view the release of advanced models as a threat to that dominance.

media r/LocalLLaMA · 7h ago

Model Registry: Torrents for open models using Hugging Face as a fallback web seed.

A new repository and site called Model Registry has been created to publish and share .torrent files for popular open models, utilizing Hugging Face as a fallback web seed. The project includes scripts to automate the process and a backend service that redirects BitTorrent clients to the correct Hugging Face endpoint.

media r/LocalLLaMA · 7h ago

Home Lab: 4x Modded 4090s for Local LLM Inference

A user details a high-performance local inference setup utilizing four modified NVIDIA RTX 4090 GPUs with 192GB of VRAM, paired with a WRX90E-SAGE SE motherboard and 3000W power supply.

media r/LocalLLaMA · 7h ago

Could AI game upscalers benefit from lightweight game-specific adapters?

A Reddit user proposes that AI upscaling technologies like DLSS and FSR could utilize lightweight, game-specific adapter layers to improve performance on low-power hardware.

media r/LocalLLaMA · 7h ago

Largest model under 64GB VRAM for distillation

A user on Reddit is seeking recommendations for the largest capable reasoning model that fits within a 64 GB VRAM limit for the purpose of knowledge distillation.

media r/LocalLLaMA · 7h ago

Quantization Impact on MTP Draft Acceptance Rates

An analysis of speculative decoding using Gemma 4-31B-it models demonstrates that heavy quantization reduces the token acceptance rate because the main model becomes less consistent with the drafter. Testing across Q5_K_S, IQ4_XS, IQ3_M, and IQ2_M quantizations reveals how draft depth affects performance.

media r/LocalLLaMA · 7h ago

Running GLM5.2 on budget hardware < $2500

A Reddit user demonstrates how to assemble a local AI inference rig for under $2500 using affordable second-hand components, specifically targeting the ability to run large language models like GLM-5.2 without expensive enterprise hardware.