All articles — korshunov.ai

All articles Page 1 / 107

Koboldcpp v1.116 released

The Koboldcpp project has released version 1.116, as announced on the LocalLLaMA subreddit and the official GitHub repository.

media r/LocalLLaMA · 9h ago

Blind-graded 55 LLMs: Same-family rating bias is statistically significant

An open evaluation involving 55 models from 11 developer families revealed that large language models exhibit statistically significant in-group bias when blind-grading each other. Across 22,254 valid judgments, every family with sufficient data showed a tendency to rate its own members differently than those of other families.

media r/LocalLLaMA · 9h ago

User asks if 2x RX 9060xt 16GB is worth it for running Qwen 3.6 27B

A user on Reddit inquires whether purchasing two AMD Radeon RX 9060 XT graphics cards with 16GB of VRAM each is a worthwhile investment for running the Qwen 3.6 27B model and similar architectures.

media r/LocalLLaMA · 9h ago

Full document redaction with Qwen 3.6 27B with a Pi agent harness

The author demonstrates that local models, specifically Qwen 3.6 27B, can perform end-to-end document redaction when optimized with a higher quantization level and an agentic harness using the PI framework.

media r/LocalLLaMA · 9h ago

claude_converter: Turn Claude Code sessions into fine-tuning data

The author developed `claude_converter`, a tool that converts local Claude Code `.jsonl` session files into formats compatible with fine-tuning frameworks like TRL, Axolotl, and LLaMA-Factory.

media r/LocalLLaMA · 9h ago

Will Chinese Open Source Models be the only option soon?

A Reddit user argues that US tech companies seek total global control over AI and view the release of advanced models as a threat to that dominance.

media r/LocalLLaMA · 9h ago

Model Registry: Torrents for open models using Hugging Face as a fallback web seed.

A new repository and site called Model Registry has been created to publish and share .torrent files for popular open models, utilizing Hugging Face as a fallback web seed. The project includes scripts to automate the process and a backend service that redirects BitTorrent clients to the correct Hugging Face endpoint.

media r/LocalLLaMA · 10h ago

Home Lab: 4x Modded 4090s for Local LLM Inference

A user details a high-performance local inference setup utilizing four modified NVIDIA RTX 4090 GPUs with 192GB of VRAM, paired with a WRX90E-SAGE SE motherboard and 3000W power supply.

media r/LocalLLaMA · 10h ago

Could AI game upscalers benefit from lightweight game-specific adapters?

A Reddit user proposes that AI upscaling technologies like DLSS and FSR could utilize lightweight, game-specific adapter layers to improve performance on low-power hardware.

media r/LocalLLaMA · 10h ago

Largest model under 64GB VRAM for distillation

A user on Reddit is seeking recommendations for the largest capable reasoning model that fits within a 64 GB VRAM limit for the purpose of knowledge distillation.

media r/LocalLLaMA · 10h ago

Quantization Impact on MTP Draft Acceptance Rates

An analysis of speculative decoding using Gemma 4-31B-it models demonstrates that heavy quantization reduces the token acceptance rate because the main model becomes less consistent with the drafter. Testing across Q5_K_S, IQ4_XS, IQ3_M, and IQ2_M quantizations reveals how draft depth affects performance.

media r/LocalLLaMA · 10h ago

Running GLM5.2 on budget hardware < $2500

A Reddit user demonstrates how to assemble a local AI inference rig for under $2500 using affordable second-hand components, specifically targeting the ability to run large language models like GLM-5.2 without expensive enterprise hardware.

media r/LocalLLaMA · 10h ago

User reports Ornith 35B outperforms Qwen in 3D game generation

A Reddit user shares their experience using the Claude Code harness to generate a 3D game with the Ornith 35B model. After three prompts, the model successfully produced the requested output, whereas the Qwen3.5-35b-a3b model failed to do so even after multiple attempts.

media r/LocalLLaMA · 10h ago

Observations on the decline of fine-tuning discussions for consumer hardware

A Reddit user notes that interest in fine-tuning models on consumer-grade hardware appears to have decreased since the release of capable generalist models like Llama-3-8b. The author suggests that improved base model intelligence reduces the necessity for fine-tuning, as prompt engineering often suffices.

media r/LocalLLaMA · 10h ago

Google runs hackathons for small models like Gemma 4 31B

Google is organizing hackathons focused on small language models, specifically the Gemma 4 31B, to demonstrate their value in AI-assisted software engineering. This initiative highlights the company's continued belief in the utility of smaller models despite the industry trend toward larger ones.

media r/LocalLLaMA · 10h ago

Mythos was the first, now GPT-5.6

The provided text is a Reddit post discussing OpenAI's GPT-5.6 model and its rollout limitations following a government request.

media r/LocalLLaMA · 10h ago

Welp ... I bought my Wife a Diet Pepsi.

A Reddit user in the r/LocalLLaMA community shared an image with the caption "Happy wife happy life as they say." The post is a personal anecdote about purchasing a Diet Pepsi for the user's wife.

media r/LocalLLaMA · 11h ago

ObviousBench: A Benchmark for Visible LLM Failures in Smaller Models

ObviousBench is a new benchmark designed to evaluate visible failures in large language models, focusing on how configuration choices impact error rates. The tool highlights the trade-offs between model size, speed, and reasoning capabilities rather than just ranking performance.

media r/LocalLLaMA · 11h ago

Cory Doctorow Interview on AI and Local AI Advocacy

This Reddit post shares an Ars Technica interview with Cory Doctorow regarding his thoughts on artificial intelligence. The original poster highlights the article's critical stance on major tech companies attempting to go public.

media r/LocalLLaMA · 11h ago

SupraLabs Releases SupraSafety-18M, a Tiny Content-Moderation Model

SupraLabs has released SupraSafety-18M, a BERT-style binary text classifier with 18 million parameters designed for content moderation on edge devices and mobile phones. The model was trained from scratch on the nvidia/Nemotron-3.5-Content-Safety-Dataset and achieves an accuracy of 81.2% and precision of 86.9%.