Code generation — korshunov.ai

Code generation Page 1 / 14

Automating fork maintenance with AI agents

This article describes a method for automating the maintenance of software forks using AI coding agents, applying it to Cohere's fork of vLLM. The approach compresses the time required to absorb upstream releases from weeks to days by replacing manual intervention with an automated feedback loop.

media r/LocalLLaMA · 12h ago

Reddit User Seeks Private Local LLM for Technical Documentation

A Reddit user is seeking recommendations for a local large language model capable of generating high-level and low-level software designs. The workflow involves using existing templates, cross-referencing code, and integrating with agentic frameworks like OpenCode via MCP to fetch data from Confluence and Jira. The user currently relies on Opus 3.6 through Kiro-cli but requires a solution that ensures data privacy. Key technical constraints include the necessity for at least 256k context length and strong reasoning capabilities. The poster questions whether hardware such as four RTX 3090 GPUs is necessary to achieve this level of performance locally.

media r/LocalLLaMA · 14h ago

Building a Bash-Based LLM Agent REPL with Minimal Dependencies

A developer created a custom agent REPL loop using exclusively standard command-line building blocks to minimize dependencies. The system relies on pipes, text streams, and append-only logs, aligning closely with classic Unix philosophy. This approach allows for flexible injection of tools to inspect, filter, redirect, and audit various stages of the agent loop. Key features include a plug-and-play backend scoped to a single command-line tool, ensuring portability across different model providers. Agent memory and context are stored in an append-only history file, enabling easy introspection, modification, and rewinding. While tested with an Ollama backend, the design supports any OpenAI-API compatible REST interface. The source code for this project is available on GitHub under the repository name llayer.

arxiv arXiv cs.CL · 19h ago

Weave of Formal Thought: Uniting Rigorous Syntactic Validation with Learned Structural Representations

The authors introduce Weave of Formal Thought (WoFT), a paradigm combining rigorous syntactic validation with learned structural representations for code generation. The approach utilizes a formal engine and constrained decoder that is sound and complete regarding the full Tree-sitter specification. By augmenting generalized LR parsing with speculative lexing, the system maintains concurrent lexer-state hypotheses to admit valid program prefixes while rejecting invalid ones. Additionally, WoFT employs latent-variable fine-tuning to train models to interleave non-terminal grammar symbols directly into the generation process. This method uses the reweighted wake-sleep algorithm to optimize the importance-weighted evidence lower bound of the surface text. The model learns to selectively retain formal derivations as an adaptive structural scratchpad during inference. Experiments on Python show that fine-tuning StarCoder2-3B with this objective reduces per-token cross-entropy by 14.3% compared to a text-only baseline.

media r/LocalLLaMA · 22h ago

Local NL-to-SQL Pipeline Using Qwen3 4B and Deterministic Planning

A developer has implemented a fully local natural language to filter generation system on hardware lacking a GPU. The solution utilizes the Qwen3 4B Instruct model running via llama.cpp with CPU-only inference. Rather than generating SQL directly, the model focuses on semantic intent and structured filter selection. A deterministic query planner subsequently handles the SQL generation and optimization processes. The pipeline employs a BM25 and embedding hybrid retrieval method using FAISS for vector storage. It retrieves the top four matching examples from approximately 800 embedded semantic instances to inject into the prompt. This approach allows the system to function effectively within strict constraints of limited RAM and no internet access.

arxiv arXiv cs.CL · 1d ago

SWE-Pro Benchmark Reveals Significant Gap Between LLMs and Expert Software Optimization

The SWE-Pro benchmark addresses the lack of realistic evaluation frameworks for software performance optimization by introducing a repository-level dataset derived from 102 expert-written optimizations. Unlike previous benchmarks that oversimplify tasks, SWE-Pro pairs each task with parameterized tests to evaluate runtime, peak memory, and Time-Weighted Memory Usage under noise-aware conditions. The study reveals that current Large Language Models struggle significantly with these complex requirements, showing negligible runtime gains and nearly non-existent memory optimizations. In sharp contrast, expert implementations achieved an aggregate speedup of 15.5x and a peak memory reduction of 171.3x across the benchmark tasks. Expert-written improvements were observed in 91.2% of tasks for runtime and 65.7% for peak memory. These findings expose a substantial gap between current LLM capabilities and the demands of expert-level engineering.

media r/LocalLLaMA · 1d ago

I reverse engineered Windows Copilot into a free OpenAI-compatible API

A user has created a local API that replicates OpenAI-compatible GPT-4 functionality using Microsoft's free Copilot service. The tool logs into a Microsoft account once, runs locally on a Windows device, and exposes a server at http://localhost:8000/v1 that supports streaming and multi-turn conversations without requiring an API key or billing. It is designed for personal and educational use, and available via GitHub at https://github.com/sums001/Windows-Copilot-API.

lab Google DeepMind Blog · 2d ago

Gemini 3.5 Flash Adds Computer Use Capability

Google has introduced computer use in Gemini 3.5 Flash, enabling the model to execute code and interact with external tools. This feature allows users to run programming tasks and access real-time information through integrated computing functions.

media r/LocalLLaMA · 2d ago

Has anyone else found vLLM outputs worse than llama.cpp?

A user reports noticing less reliable outputs from vLLM compared to llama.cpp, including formatting errors, context forgetting, and lower code quality. They ask whether such differences stem from quantization, chat templates, parser issues, or configuration errors, and seek confirmation if others have observed similar quality discrepancies between inference backends.

media r/LocalLLaMA · 2d ago

Build a LLM from Scratch using MLX

A developer created a Nano LLM with 20.2M parameters on a MacBook Air using the MLX framework. The project demonstrates that building a large language model from scratch is feasible with minimal hardware and basic Python knowledge.

media r/LocalLLaMA · 2d ago

llama.cpp web UI adds optional JavaScript execution via Web Workers

llama.cpp's web UI now supports executing JavaScript generated by language models in the browser using Web Workers, enabled via an opt-in setting. The code runs in a sandboxed iframe with security restrictions, though network requests appear disabled and the allowed sandbox capabilities lack clear documentation.

media r/LocalLLaMA · 2d ago

Dual GPU Sanity Check: Is This a Smart Buy?

A user asks whether adding a GTX 5060 Ti 16GB to their existing RTX 5090 setup is worth it for better VRAM to run larger LLMs and extend ComfyUI video generation. The upgrade would allow using Qwen 3.6 with 256K context and improve 1440p video generation, though performance gains in ComfyUI are limited due to current software constraints.

media r/LocalLLaMA · 2d ago

Qwen-AgentWorld-35B-A3B for Coding?

The Qwen-AgentWorld-35B-A3B model shows strong performance in coding tasks, with a 65.63% score on Software Writing Evaluation and 65.92% overall benchmark. It outperforms Qwen3.5-35B-A3B and rivals larger models in agent-based tasks, with a first impression noting superior accuracy in long-term agent workflows.

media r/LocalLLaMA · 2d ago

Gemma 4 26BA4B Surprisingly Usable at IQ3_S

A user reports that Gemma 4 26B quantized to Q3 runs at 25 tokens per second on a MacBook Air, performing nearly as well as bf16 for non-coding, tool-calling tasks. They question whether this performance reflects confirmation bias or if small quantized models are genuinely usable.

arxiv arXiv cs.AI · 2d ago

Text2DSL: LLM-Based Code Generation for Domain-Specific Languages

This paper introduces Text2DSL, a distinct task of generating domain-specific language code from natural language. Using the PolkitBench dataset of 4,204 validated pairs, it shows that structured context—such as BNF grammar and API specs—boosts syntactic and structural validity and CodeBLEU scores by 60% to 95% across different LLM models, without fine-tuning.

media r/LocalLLaMA · 2d ago

Qwen3.6 27B more dumb in vLLM compared to llama.cpp

A user reports that Qwen3.6-27B runs significantly less intelligently in vLLM than in llama.cpp, exhibiting issues like ignoring messages, hallucinating tool calls, and failing to recognize prior conversation context. Despite proper configuration and prompt templates, the model appears to lose coherence and misinterprets its own tool usage, with errors occurring consistently rather than sporadically.

github llama.cpp · 2d ago

vulkan-shaders-gen now fails build on shader compilation errors

The vulkan-shaders-gen tool now detects and fails the build when shader compilation fails, preventing the creation of a broken libggml-vulkan. This fix addresses a prior issue where build success masked runtime failures, and includes improvements to error handling and atomic flag management across platforms.

arxiv arXiv cs.LG · 2d ago

TRIZ-Inspired Text-to-CAD Framework Enhances Creative Design

A TRIZ-inspired text-to-CAD framework uses large language models to generate creative, editable 3D CAD models by integrating inventive principles from patent data. In a chair design case study, it achieved 4.0-14.7% mass reduction while preserving structural integrity through principles like segmentation and composite materials.

arxiv arXiv cs.LG · 2d ago

CAT-Translate: Compact Japanese-English Translation Models

CAT-Translate introduces a family of small, open-source models (0.8B to 7B parameters) specialized for Japanese-English bidirectional translation. Using synthetic parallel corpora and a two-stage fine-tuning approach with Multi-Objective GRPO, the models outperform multilingual models on real-world benchmarks across business, legal, medical, financial, and patent domains.

arxiv arXiv cs.LG · 2d ago

ASCII Art Enables Text-Only LLMs to Control VLA Systems

A text-only large language model can be adapted into a Vision--Language--Action controller by using ASCII-rendered visual observations. This approach allows LLMs to interpret visual states through text, enabling them to follow natural-language instructions and generate executable actions in both simulation and on physical manipulators.