Code generation — korshunov.ai

Code generation Page 1 / 14

Local agent on 4090 - looking for LM Studio settings

A user reports slow token generation when running a local agent on a 4090 with 24GB VRAM, despite adjusting context and batching settings. They note Gemma4 performs faster but produces incorrect tokens like <code></tool_call></code>, and seek recommended settings and explanations for parameters such as top_p and top_k.

media r/LocalLLaMA · 5d ago

SupraLabs Releases supra-title-FFT-preview with 115K Samples

SupraLabs has launched supra-title-FFT-preview, a chat title generation model trained on 115K samples from a filtered dataset, expanding coverage beyond its previous 12K-sample model. The model uses full fine-tuning on LiquidAI/LFM2.5-350M-Base with BF16 precision and is designed for single-purpose chat title generation, available via Hugging Face and supporting direct loading or vLLM deployment.

media r/LocalLLaMA · 5d ago

I benchmarked Claude's 'Fast C++'. It wasn't faster

A user tested Claude's claimed 'Fast C++' implementation and found it did not outperform standard C++ in benchmarks. The post includes a link to a Substack article detailing the testing process and results.

media r/LocalLLaMA · 5d ago

$1800 GPU cost runs Qwen3.6-27B with 262K context and 55 tok/s

A setup using four 5060 Ti GPUs (totaling $1800) achieves 55 tokens per second with Qwen3.6-27B-FP8, supporting 262K context length and bfloat16 KV cache. The configuration uses P2P and FlashInfer, with benchmark results showing 55.67 output token throughput and 65.25% speculative decoding acceptance rate.

blog Simon Willison · 5d ago

Sean Lynch on MCP's Auth Flow Isolation

Sean Lynch highlights that the Model Context Protocol (MCP) offers a key advantage by isolating authentication flows outside the agent's context window. He suggests the ideal form of MCP could be a simple auth gateway for APIs, which would still represent a significant improvement.

media r/LocalLLaMA · 5d ago

Help Running Local Hermes Agent with llama-cpp

A user reports issues running a local Hermes AI agent on a high-end rig using self-compiled llama-cpp. The setup experiences frequent KV cache reprocessing every 5 messages and slow reasoning, with the agent repeatedly pausing to report progress instead of continuing autonomously. The user seeks guidance on whether their llama-cpp parameters are incorrect or what adjustments can improve agent performance and sustained reasoning without interruptions.

media r/LocalLLaMA · 5d ago

SupraLabs Releases SupraVL-Nano-900k Vision-Language Model

SupraLabs has launched SupraVL-Nano-900k, a fully transparent, 900k-parameter vision-language model trained from scratch on Flickr8k. It features a CNN visual encoder, GPT-2-style decoder, and prefix concatenation fusion, with all components openly documented and designed for educational clarity.

media r/LocalLLaMA · 5d ago

How to Set Optimal llama.cpp Parameters for AMD GPU

Users seeking optimal llama.cpp settings for gemma 4 models on an AMD GPU with 16GB VRAM ask whether trial and error is necessary. They reference Google's default settings for temperature, top-p, and top-k but note inconsistent results, indicating a need for more targeted guidance beyond official documentation.

media r/LocalLLaMA · 5d ago

How to Setup Search with AI Models

A user asks how to integrate Gemma 4 12B with search capabilities using self-hosted AI models. They mention trying openwebui, which has issues with search engines like DDG, and seek alternatives that avoid using Brave or Google API keys.

media r/LocalLLaMA · 5d ago

Commission selects EUROPA consortium as winner of Frontier AI Grande Challenge

The European Commission has chosen the EUROPA consortium, led by Domyn, to develop an open-source frontier AI model in all 24 EU languages. The project, launched in February 2026, aims to create a model with over 400 billion parameters, showcasing Europe's capacity to build advanced AI on its own infrastructure.

media r/LocalLLaMA · 5d ago

Improving local models with an API-based consultant agent

A user asks whether adding a powerful API-based 'consultant' agent, such as GLM 5.2, could enhance local AI workflows by refining plans and learning processes. The post explores the potential benefits of such an agent in improving local model performance through external consultation.

media r/LocalLLaMA · 5d ago

The economics of AI are starting to favor open models

Recent AI model releases show that high-intelligence, low-cost models are increasingly dominated by open-weight models like DeepSeek, Qwen, GLM, Kimi, and MiniMax. For most real-world applications, the performance gap between frontier closed models and strong open models is shrinking faster than cost differences, making open models competitive in terms of both capability and price.

media Don't Worry About the Vase · 5d ago

Claude Fable 5 and Mythos 5: Capabilities

Anthropic launched Claude Fable 5, a Mythos-class model claiming state-of-the-art performance across software engineering, scientific research, and knowledge work. It was quickly taken down by the U.S. government after a jailbreak was reported, though Anthropic asserts it is now available again, with Fable 5 showing exceptional capabilities and a more nuanced, thoughtful reasoning style compared to prior models.

media r/LocalLLaMA · 6d ago

The Eagle3 has landed for Qwen

The Eagle3 speculative decoding model is now available in llama.cpp's latest release via --spec-type draft-eagle3. It requires a draft model, such as Ex0bit-Qwen3.6-27B-PRISM-EAGLE3-GGUF, and can be used with -md or --model-draft. Performance is comparable to draft-mtp, though tensor parallelism is not supported and VRAM usage is higher.

media r/LocalLLaMA · 6d ago

Has anyone used VibeThinker-3B outside benchmarks?

A Reddit user asks about real-world performance of VibeThinker-3B beyond benchmark scores, focusing on debugging, coding, reasoning, latency, and usability. The model is available on Hugging Face and described in a paper on arXiv.

github llama.cpp · 6d ago

llama.cpp release b9718: consolidated slot selection and new binary builds

llama.cpp version b9718 consolidates slot selection into a single function, get_available_slot, while maintaining LCP similarity checks for prompt cache updates. The release includes binary builds for macOS, Linux, Android, Windows, and openEuler across multiple architectures and hardware acceleration options.

media r/LocalLLaMA · 6d ago

Little late thank you to the DeepSeek team!

A user thanked the DeepSeek team for releasing DeepSeek V4 Pro and its Flash version, which fits on local hardware. The post was made seven months after an initial Reddit post.

media Latent Space · 6d ago

GLM-5.2 Passes Vibe Check, Outperforms GPT-5.5

GLM-5.2 has passed a 'vibe check' as a frontier open model, receiving praise from Jeremy Howard and outperforming GPT-5.5 in Artificial Analysis' new knowledge work benchmark. It also gained validation from the /r/LocalLlama community, indicating strong real-world utility and performance.

media r/LocalLLaMA · 6d ago

How can I self host code review?

A user asks about self-hosting code review tools due to Gemini Code Assist ending consumer support and moving to enterprise only. They are exploring GitHub apps or actions for local or cloud-based solutions.

github llama.cpp · 6d ago

LLaMA.cpp Release b9715 Adds CUDA Col2Im 1D and Multiple Platform Binaries

LLaMA.cpp version b9715 introduces CUDA support for GGML_OP_COL2IM_1D, building on a CPU implementation. The release includes binaries for macOS, Linux, Android, Windows, and openEuler across multiple architectures and acceleration frameworks, including Vulkan, ROCm, OpenVINO, and SYCL.