All articles — korshunov.ai — ML news

All articles Page 1 / 130

media r/LocalLLaMA · 12d ago

SupraLabs Releases SupraVL-Nano-900k Vision-Language Model

SupraLabs has launched SupraVL-Nano-900k, a fully transparent, 900k-parameter vision-language model trained from scratch on Flickr8k. It features a CNN visual encoder, GPT-2-style decoder, and prefix concatenation fusion, with all components openly documented and designed for educational clarity.

media r/LocalLLaMA · 12d ago

Adding a Second GPU to X670E Motherboard for Local LLMs

A user wants to add a second 16GB VRAM GPU (5060 Ti or 5070 Ti) to their MSI X670E Tomahawk WiFi motherboard for running large local LLMs like Qwen 3.6 27B. The current setup lacks space for a second GPU due to the primary 5070 Ti occupying the second PCIe slot, leaving only the third slot partially available. The user seeks advice on feasible options—such as using the fourth PCIe slot or a riser—while considering cooling, stability, and physical fit, especially with a horizontal GPU mount like the Lian Li VG4v4.

media r/LocalLLaMA · 12d ago

Best Harness for Web Searching

Users report that tools like LM Studio and Odysseus are limited by search engine request caps, often at 10 per day or hour, without API access. They suggest creating DuckDuckGo API accounts for better search access, but note that frontends rarely prompt for this. The post asks whether Hermes or Pi offer improved solutions.

media r/LocalLLaMA · 12d ago

What's more impressive, GLM 5.1 to 5.2 or Qwen 3.5 to 3.6?

A Reddit post compares the performance improvements of GLM 5.1 to 5.2 and Qwen 3.5 to 3.6. The post notes that mentioning 'Döner' activates GLM 5.2's German-specific weights, while Qwen 3.6 is evaluated with 35B parameters using Unsloth Q8 K XL quantization via llama.cpp.

media Interconnects · 12d ago

Banning Open Source AI Would Be a Mistake

The article argues that banning open source AI would be a grave mistake, as it is safe, secure, and drives innovation, education, and competition. Open source has long powered technological progress and serves as a vital counterweight to monopolistic AI models, ensuring broader access and democratic innovation without compromising safety or security.

media r/LocalLLaMA · 12d ago

Is My CPU and RAM Too Weak for Local LLMs?

A user reports their CPU and RAM are reaching 100% during simple test prompts, while the GPU is underutilized. They question whether their RTX 3050 8GB GPU can run Quen3.5:9b locally, noting that in theory it should be feasible.

github llama.cpp · 12d ago

llama.cpp Release b9724 with Bug Fixes and Cross-Platform Binaries

llama.cpp version b9724 includes several bug fixes and improvements, such as build fixes, overflow avoidance in area() function, and a sanity check in get_u32(). The release provides pre-built binaries for macOS (arm64 and x64), Linux (x64, arm64, s390x, Vulkan, ROCm, OpenVINO, SYCL), Android (arm64), Windows (x64, arm64, CUDA 12/13, Vulkan, OpenVINO, SYCL, HIP), and openEuler (x86 and aarch64 with ACL Graph support), along with a UI package.

media r/LocalLLaMA · 12d ago

Watching a Local AI Voice Assistant Get Dumber

A test on an RTX 5060 Ti showed that reducing a local AI voice assistant's model size from 9B to 0.8B leads to a sharp decline in capability. The 9B model handles tool orchestration well, while smaller models show increasing failures: the 4B model skips tool calls and guesses facts, the 2B model suffers semantic drift, and the 0.8B model fails to operate agent functions, triggering wrong APIs or infinite loops.

media r/LocalLLaMA · 12d ago

GLM-5.2 is the new leading open weights model on the Artificial Analysis Intelligence Index

GLM-5.2 has been designated as the leading open weights model on the Artificial Analysis Intelligence Index. This recognition reflects its performance and capabilities within the open-source AI model landscape.

media r/LocalLLaMA · 12d ago

The Eagle3 has landed for Qwen

The Eagle3 speculative decoding model is now available in llama.cpp's latest release via --spec-type draft-eagle3. It requires a draft model, such as Ex0bit-Qwen3.6-27B-PRISM-EAGLE3-GGUF, and can be used with -md or --model-draft. Performance is comparable to draft-mtp, though tensor parallelism is not supported and VRAM usage is higher.

media r/LocalLLaMA · 12d ago

New Agentic Benchmark Released

Artificial Analysis has introduced a new agentic benchmark that evaluates large language models' ability to plan and execute tasks. Claude Fable and GLM 5.2 achieved top positions within their respective cohorts, demonstrating strong performance on this unsaturated benchmark.

github llama.cpp · 12d ago

llama.cpp release b9723 adds support for Qwen3.5 and Qwen3.6 Eagle3

llama.cpp version b9723 introduces support for Qwen3.5 and Qwen3.6 models via Eagle3. The release includes deferred boundary checkpoints restore for hybrid models and updates to API and naming conventions. Binary builds are available for macOS, Linux, Android, Windows, and openEuler platforms, with options for CPU, Vulkan, OpenVINO, SYCL, and ROCm.

media r/LocalLLaMA · 12d ago

Spec: Support Eagle3 for Qwen3.5 & 3.6 by ruixiang63

A pull request adds support for the Eagle3 model in Qwen3.5 and Qwen3.6 within llama.cpp. The change is proposed by ruixiang63 and submitted to the ggml-org/llama.cpp repository.

media r/LocalLLaMA · 12d ago

Has anyone used VibeThinker-3B outside benchmarks?

A Reddit user asks about real-world performance of VibeThinker-3B beyond benchmark scores, focusing on debugging, coding, reasoning, latency, and usability. The model is available on Hugging Face and described in a paper on arXiv.

github llama.cpp · 12d ago

LLaMA.cpp Release b9722: Fixes and Cross-Platform Binaries

LLaMA.cpp version b9722 fixes a non-bound n_discard value issue in server context handling. The release includes precompiled binaries for macOS, Linux, Android, Windows, and openEuler, supporting various architectures and acceleration frameworks like Vulkan, CUDA, OpenVINO, and SYCL.

media r/LocalLLaMA · 12d ago

Anyone here rocking dual RTX 5090s?

A user asks if anyone has recently built a dual RTX 5090 setup, noting their current dual RTX 3090 system performs well for software development. They mention upgrading to dual RTX 5090s is expensive and consider their dorm power outlets a potential limitation.

media r/LocalLLaMA · 12d ago

Local LLM Censorship Reported on Reddit

Users report that local language models are refusing to answer questions without guardrails, raising concerns about censorship in decentralized AI setups. The issue was shared on Reddit's LocalLLaMA community, where users describe instances of models blocking responses to legitimate queries.

media r/LocalLLaMA · 12d ago

Multi Doc Agent Workflows in Word

A blog post details how to implement multi-document agent workflows in Microsoft Word using local LLMs. The guide outlines steps to enable agents to process and interact with multiple documents within a single Word environment.

media r/LocalLLaMA · 12d ago

The meme must go on

A Reddit post titled 'The meme must go on' shares an image of a meme related to local LLaMA models. The post is submitted by user /u/ego100trique and includes a link to the image and comments section.

media r/LocalLLaMA · 12d ago

EvoTensile: Evolutionary tuning of AMD Tensile GEMM kernels

EvoTensile uses evolutionary algorithms to tune GEMM kernels for AMD GPUs, improving NT layout performance from 20 to 40 TFLOPS on Strix Halo. This speedup represents a significant advance over unoptimized kernels, though it remains below the theoretical roofline of 59.4 TFLOPS.