All articles — korshunov.ai

All articles Page 1 / 106

vulkan: make TP viable by pwilkin · Pull Request #25051

A pull request submitted to the ggml-org/llama.cpp repository aims to improve the viability of Vulkan Tensor Parallelism. The contributor, identified as Piotr, has implemented changes intended to make this feature more usable.

media r/LocalLLaMA · 7h ago

Developer builds local-first LLM harness and seeks community feedback

A developer with 45 years of software experience is completing a local-first harness for running local and API models, featuring logic around multiple agents. The author has spent six months building tools to improve the local LLM workflow and is now asking the community what features would enhance their experience.

media r/LocalLLaMA · 7h ago

Why do people keep investing in Intel for AI?

The article questions the rationale behind Wall Street's classification of Intel as an "AI picks and shovels" investment, asking who is actually purchasing Intel hardware for AI data centers.

media r/LocalLLaMA · 8h ago

Reddit user seeks advice on multi-model backends and config swapping

A Reddit user is planning to deploy a machine with multiple GPUs for serving coding and Hermes models, seeking solutions that allow flexible configuration swapping without manual intervention.

media r/LocalLLaMA · 8h ago

Consider post-training instead of benchmarking for new hardware

The author argues that acquiring new hardware should be used for supervised fine-tuning (SFT) and reinforcement fine-tuning (RFT) rather than standard model benchmarking. This approach offers a viable path to monetization by leveraging open models, especially as proprietary APIs become less accessible or more expensive.

blog Simon Willison · 8h ago

2,000 people tried to hack my AI assistant

Fernando Irarrázaval conducted a challenge on hackmyclaw.com to test if 6,000 attempts could leak secrets from his OpenClaw instance using the Opus 4.6 model.

blog Simon Willison · 8h ago

Spectacular hypothetical incident report by Andrew Nesbitt

Andrew Nesbitt published a speculative incident report detailing a scenario where two AI review agents from competing vendors enter a disagreement loop over the safety of the 'foxhole-lz4' package.

media r/LocalLLaMA · 8h ago

Streaming medical STT running locally on a MacBook

A developer has created a streaming medical speech-to-text model that operates fully on-device, demonstrated via MLX on a MacBook. The project is currently undergoing further evaluations, with open weights planned for release next week.

media r/LocalLLaMA · 8h ago

Book Review: Domain-Specific Small Language Models by Guglielmo Iozzia

This review evaluates Guglielmo Iozzia's book "Domain-Specific Small Language Models," which advocates for a paradigm shift from generalist large language models to specialized, fine-tuned small language models (SLMs). The reviewer argues that SLMs offer superior control, visibility, and cost-efficiency for narrow tasks compared to the hype surrounding artificial general intelligence.

media r/LocalLLaMA · 8h ago

Distill-on-idle pipeline for on-device memory assistant using 4B models

The article details an engineering approach to building a local AI assistant that converts raw screen captures and meeting transcripts into queryable data using only models that run efficiently on laptops. The system leverages Apple's Vision framework for OCR, idle-time distillation of a 4B Gemma model, and hybrid retrieval to avoid performance bottlenecks.

blog Simon Willison · 8h ago

OpenAI previews GPT-5.6 series with Sol, Terra, and Luna models

OpenAI has initiated a limited preview of the GPT-5.6 model series, introducing three distinct variants: Sol as the flagship, Terra for balanced everyday work, and Luna for fast, affordable tasks. The company plans to make these models generally available in the coming weeks following this initial phase with trusted partners.

media r/LocalLLaMA · 8h ago

User asks for advice on utilizing 8 Tesla T4 GPUs

A Reddit user has acquired eight Tesla T4 datacenter cards from retired VDI servers and is seeking recommendations on how to utilize the remaining units. One card is currently functional in a DEG1 chassis, but the rest require a use case or configuration strategy.

media r/LocalLLaMA · 9h ago

Considering upgrade from 2 x RTX 3090s to 4 x 5070 TI

A user on r/LocalLLaMA is considering upgrading their hardware setup from two RTX 3090 GPUs to four RTX 5070 Ti cards, specifically evaluating the performance implications for single-stream inference.

media r/LocalLLaMA · 9h ago

Open-sourcing a harness for evaluating VLMs on your own video with traced runs

The authors have open-sourced a harness for evaluating Vision-Language Models (VLMs) that allows users to test models on their own video data with full reproducibility through traced runs. This tool ties every result to its specific input and configuration, enabling accurate evaluation of accuracy, latency, and cost.

media r/LocalLLaMA · 9h ago

Reddit Discussion: Local AI Workflows

A Reddit post in the r/LocalLLaMA community asks users to share local AI workflows that significantly improved their productivity or utility. The author specifically invites suggestions regarding RAG, MCP, coding agents, prompt organization, document indexing, and automation.

media r/LocalLLaMA · 9h ago

User asks whether to buy one RTX Pro 6000 or two DGX Sparks for local AI development

A Reddit user is seeking hardware recommendations for running multiple small to medium-sized models locally for data parsing, extraction, and reasoning tasks. The user intends to use the setup for model building, testing, LoRA creation, and distillation, while reserving large cloud models like Opus for complex tasks.

media r/LocalLLaMA · 9h ago

Gemma 4 12b needs glasses

A user reports frustration with Gemma 4's default image resolution settings, noting that the model struggles to decipher smaller text and larger compositional elements compared to competitors like Qwen 3.6.

media r/LocalLLaMA · 9h ago

Planning small AI RIG, 5 X 5060ti 16GB, after selling my 5090

A user on Reddit is asking for feedback on a plan to sell their Zotac Solid RTX 5090 with 128GB of RAM and replace it with five RTX 5060 Ti 16GB cards.

media r/LocalLLaMA · 9h ago

vibe shift: I can see this coming...

The provided source content consists solely of a Reddit post title and metadata without any accompanying article text or substantive information.

media r/LocalLLaMA · 9h ago

Reddit user proposes combining RTX 5080 and 4060 for local LLM inference

A Reddit user in the r/LocalLLaMA community is considering upgrading their hardware to improve inference speed and capacity for Qwen models by pairing a future RTX 5080 with their existing RTX 4060. The user aims to achieve at least 20-40 tokens per second while running Qwen 27B models, utilizing the combined 24GB of VRAM through tensor or layer splitting in llama.cpp or vLLm. They are evaluating this asymmetric dual-GPU setup against other options like the AMD R9700 AI Pro or 7900XTX, citing benchmark data that suggests limited performance gains for the AMD cards relative to their cost.