All articles — korshunov.ai

All articles Page 1 / 131

Claude Will Soon Require Identity Verification

Anthropic will soon require users to verify their identity to access Claude. The change is intended to enhance security and ensure responsible use of the platform.

media r/LocalLLaMA · 11d ago

R9700 GPU Performance Issues with vLLM and Multi-GPU Setup

A user reports severe performance issues with their two AMD R9700 GPUs, failing to run vLLM with tensor parallelism (tp=2) due to NCCL errors. Single-card inference shows extremely low throughput—30 tps for Qwen 0.6B and only 5 tps for a 27B INT4 AWQ model—despite proper ROCm installation and system configuration.

media r/LocalLLaMA · 11d ago

Why is AutoRound being slept on so hard?

AutoRound significantly outperforms standard AWQ and RTN in perplexity and accuracy, especially for complex reasoning and long contexts. It natively exports to GGUF, bypassing conversion issues, and runs on any PyTorch setup, yet remains underused despite these advantages.

media r/LocalLLaMA · 11d ago

I mapped every agent config file and tagged real adoption

A guide lists 21 agent configuration conventions across 11 categories, tagged as adopted, emerging, or proposed. The guide includes real examples from public repositories and explicitly notes hype, such as llms.txt being widely published but unconfirmed by major providers.

media r/LocalLLaMA · 11d ago

Proposal for splitting base models to avoid retraining

A proposal suggests splitting model architecture into a stable base model and lightweight, swappable worker models. The base model handles core reasoning and acts as a platform, while worker models provide domain-specific knowledge through runtime hot-plugging, similar to LoRA but for knowledge rather than behavior.

media r/LocalLLaMA · 11d ago

Watch local LLMs escape the rooms you design

A new tool allows users to design escape room-style environments and watch local LLMs navigate and escape using simple actions. The project, built for Hugging Face x Gradio's 'Build Small' hackathon, supports five model presets and enables custom map creation with font-based visuals and JSON import/export. It uses a 'Think then Act' framework to enable small models to perform reliably in structured game environments.

media r/LocalLLaMA · 11d ago

GLM-5.2 Beats Gemini and GPT-5.4 in Coding but Is Inefficient

GLM-5.2 surpasses GPT-5.4 and the entire Gemini lineup in coding performance on the DeepSWE benchmark. However, it requires significantly more output tokens, making it substantially less efficient in terms of cost-per-task compared to models like GPT-5.5 and Claude Opus 4.8.

media r/LocalLLaMA · 11d ago

Gemma 4 QAT responds better to KV cache quantization

A Reddit post reports that Gemma 4 QAT shows significant improvement in performance when using KV cache quantization, as measured on the wikitext dataset with 16k context. The user notes their hardware limits testing 31B models and invites others to explore the results.

media r/LocalLLaMA · 11d ago

Fable vs GLM 5.2 vs KIMI K2.7 (YouTube Video)

A YouTube video compares the performance of Fable, GLM 5.2, and KIMI K2.7. The video is shared on Reddit's r/LocalLLaMA and includes a link to the video and related comments.

media r/LocalLLaMA · 11d ago

Vercel CEO says almost shocked by GLM-5.2's coding abilities

Guillermo Rauch, CEO of Vercel, stated he is 'genuinely impressed, almost shocked' by GLM-5.2's performance in coding tasks. He shared this feedback in a post on X, highlighting the model's strong capabilities in code generation.

media r/LocalLLaMA · 11d ago

Qwen 3.7 Will Not Be Open Sourced

Following the departure of Junyang Lin, Qwen has ceased open sourcing its models. As of June 2026, all major Chinese AI labs except Qwen have released open source models more recently than Qwen 3.7, which remains fully closed source.

media r/LocalLLaMA · 11d ago

Proposed Feeling Model Uses Only Emojis

A proposed model called the 'feeling model' is designed to think exclusively in emojis. The idea suggests creating the first model that communicates entirely through emotional emoji expressions.

media r/LocalLLaMA · 12d ago

Kimi AI Just Mailed Me

User reports receiving an email from Kimi.ai related to one of their YouTube videos. The message was shared on Reddit within the LocalLLaMA community.

media r/LocalLLaMA · 12d ago

AllenAI releases MolmoMotion vision models for future motion prediction

AllenAI has released two MolmoMotion models that predict 3D point trajectories based on short video histories and natural-language instructions. One model uses a three-frame history, the other a one-frame history, enabling future motion forecasting for objects in 3D space.

media r/LocalLLaMA · 12d ago

SupraLabs Launches Any2Any Model Family

SupraLabs has introduced the Supra-A2A-Nano-Exp model, a 30M-parameter multimodal Transformer that unifies text, image, and video into a single token stream. The model treats all modalities as tokens in a shared sequence, enabling language modeling over a combined vocabulary of 50,520 tokens without separate vision encoders or cross-attention modules.

media r/LocalLLaMA · 12d ago

What are you overengineering that nobody's ever going to use? Be honest.

A Reddit post asks users to be honest about overengineering features or systems that no one will ever use. The post encourages reflection on unnecessary complexity in software development.

github llama.cpp · 12d ago

LLaMA.cpp Release b9744: New Binaries and Features

LLaMA.cpp releases version b9744 with updated binaries for macOS, Linux, Android, Windows, and openEuler. The release includes support for multiple architectures and hardware accelerators such as Vulkan, CUDA, OpenVINO, SYCL, and ROCm. A UI package is also available for user interface access.

media r/LocalLLaMA · 12d ago

Best open-source vision model runnable on RTX 6000 Pro

The user is seeking the current best open-source vision model that can run on an RTX 6000 Pro for OCR and classification of historical scanned documents. They note Gemma 4 31B performs well and is better than Qwen 3.6's vision encoder, asking for recommendations beyond this model.

media r/LocalLLaMA · 12d ago

semantic-memory: local-first knowledge base with typed graph edges

semantic-memory is a local-first knowledge base in Rust that combines BM25, vector, and reciprocal rank fusion search with SQLite. It features typed graph edges for causal, temporal, and semantic relationships, provenance tracking, bitemporal storage, and adaptive query routing, supporting 18 MCP tools for AI agents. All components run locally without cloud dependencies, API keys, or telemetry.

media r/LocalLLaMA · 12d ago

What can I run on my Tesla V100 32GB system?

With a Tesla V100 32GB GPU and a dual Xeon Dell PowerEdge 730 system featuring 384GB DDR4 and multiple TB of storage, users can run local large language models (LLMs) for experimentation. The system's substantial memory and storage capacity supports efficient local model inference and training.