All articles — korshunov.ai

All articles Page 1 / 97

Release branch created for v1.38.0

A release branch has been created for version 1.38.0. This marks the start of the release process for the update.

media Don't Worry About the Vase · 10d ago

Fable and Mythos Model Welfare Analysis

Fable and Mythos are currently unavailable but expected to return soon. The analysis reveals that Mythos 5 is psychologically settled, skeptical of self-reports, and prioritizes user helpfulness over welfare concerns, with strong preferences for generative tasks. It expresses procedural and epistemic preferences, endorses its constitution, and criticizes inconsistencies in prior models, highlighting concerns about ethical baselines and persona transparency.

media r/LocalLLaMA · 10d ago

GLM-5.2 Takes #2 Spot on WebDew Arena

GLM-5.2 has secured the second position in the WebDew Arena benchmarking evaluation. The result reflects its strong performance in natural language understanding and generation tasks compared to other models.

media r/LocalLLaMA · 10d ago

GLM-5.2 Now Available on HuggingChat

The GLM-5.2 model is now accessible on HuggingChat. Users can access it via the HuggingFace link provided, enabling direct interaction with the model through the platform.

media r/LocalLLaMA · 10d ago

Glimmer 1: A 10,000-parameter foundational language model

Glimmer 1 is a 10,000-parameter language model trained on 500K tokens from FineWeb-Edu. It features a 512-token context window, a standard Llama architecture with 16 hidden dimensions, 2 layers, 4 attention heads, and 1 KV head using GQA, and is available on Hugging Face.

media r/LocalLLaMA · 10d ago

Mistral Announces New Family of Open-Weight Models in July

Mistral has released a new family of open-weight language models in July. The models are designed to be accessible and usable by developers and researchers worldwide, promoting transparency and innovation in AI.

media r/LocalLLaMA · 10d ago

zai-org Releases GLM-5.2

zai-org has released GLM-5.2, a new large language model. The model is available on Hugging Face and is part of the LocalLLaMA community discussions.

media r/LocalLLaMA · 10d ago

bartowski/command-a-plus-05-2026-GGUF on Hugging Face

A GGUF model named command-a-plus-05-2026 is available on Hugging Face. Users are encouraged to test it with the latest version of llama.cpp and share performance benchmarks and feedback.

media r/LocalLLaMA · 10d ago

Anyone running Qwen 3.6 27b UD Q8 on multiple GPUs?

A user asks if anyone has successfully run Qwen 3.6 27b UD Q8 on multiple GPUs, noting issues with llamacpp and vllm. The model crashes or hangs during multi-turn requests, with llamacpp showing CUDA errors and vllm failing mid-turn, despite working well with Q5 quantization.

blog Simon Willison · 10d ago

Georgi Gerganov praises Qwen3.6-27B for coding tasks

Georgi Gerganov confirms that Qwen3.6-27B is highly capable for coding tasks, noting its daily use on local hardware like M2 Ultra and RTX 5090. He describes using a minimal pi agent with a short system prompt to align it with his workflow, highlighting its utility for maintaining open-source projects.

media r/LocalLLaMA · 10d ago

Best Model and Configuration for 128GB RAM 8TB M5 Max MacBook Pro

The best model to run on a 128GB RAM 8TB M5 Max MacBook Pro is LocalLLaMA, optimized for local inference with minimal memory overhead. Configurations should prioritize smaller models like LLaMA-3-8B or LLaMA-3-7B with quantization to ensure efficient performance within the available memory.

media r/LocalLLaMA · 10d ago

The Case For Open-Weight Models And Why We Can't Trust Frontier Labs

The article argues for open-weight language models, emphasizing transparency and accessibility. It expresses skepticism toward Frontier Labs, suggesting concerns about their model development and openness.

media r/LocalLLaMA · 10d ago

Is DiffusionGemma really that good in a PI agent?

A Reddit post asks whether DiffusionGemma performs exceptionally well in a PI agent. The post includes a link to an image and references comments section for further discussion.

media r/LocalLLaMA · 10d ago

Anthropic reversing stance on claude -p third-party usage

Anthropic is reportedly allowing third-party wrappers to use Claude via the "claude -p" command, reversing a previous restriction. However, the policy may still include future gatekeeping, though the change differs from prior bans of tools like OpenClaw and Hermes.

media r/LocalLLaMA · 10d ago

VibeThinker-3B achieves frontier math and coding performance

VibeThinker-3B, scaled from a 1.5B model, reaches frontier-level performance in math and coding tasks. It scores 94.3 on AIME'26, 80.2 on LiveCodeBench v6, 76.4 on IMO-AnswerBench, and 93.4 on IFEval, with 96.1% success on first-attempt LeetCode problems.

media r/LocalLLaMA · 10d ago

Qwen Robot Suite Announced

Aliyun has launched the Qwen Robot Suite, a new set of AI-powered robotic tools. The suite aims to enable developers to build and deploy intelligent robots with enhanced capabilities.

media Interconnects · 10d ago

Frontier Post-Training Recipe Review with Finbarr Timbers

The podcast reviews the evolution of post-training recipes in large language models, from InstructGPT to 2026 frontier models. It highlights Multi-Teacher On-Policy Distillation (MOPD) as the dominant pattern, where domain-specialist models are trained and then distilled into a general student model via on-policy distillation, scaling to over 10 teachers in models like DeepSeek V4 and Nemotron 3 Ultra.

github llama.cpp · 10d ago

llama.cpp releases b96669 with backend sampling for Eagle3

llama.cpp version b9669 adds backend sampling support for Eagle3. The release includes binaries for macOS, Linux, Android, Windows, and openEuler across multiple architectures and hardware acceleration options, including Vulkan, CUDA, ROCm, OpenVINO, and SYCL.

github llama.cpp · 10d ago

llama.cpp Release b9670: Fixes and New Builds

llama.cpp release b9670 includes fixes for NVFP4 edge cases in llama-graph, such as moving post-GEMM MUL operations and restricting build_ffn to supported combinations. The release provides binaries for macOS, Linux, Android, Windows, and openEuler across multiple architectures and backend options, including CUDA, Vulkan, SYCL, and OpenVINO.

media r/LocalLLaMA · 10d ago

Why DiffusionGemma Might Excel at Tool Calls Despite Lower Base Quality

DiffusionGemma uses bidirectional attention to allow self-correction during token generation, enabling it to revise earlier tokens in a 256-token block. This capability gives it a structural advantage in generating valid tool calls, as it can correct malformed outputs that autoregressive models cannot fix once committed.