Code generation
media r/LocalLLaMA · 12h ago

Reddit User Seeks Private Local LLM for Technical Documentation

A Reddit user is seeking recommendations for a local large language model capable of generating high-level and low-level software designs. The workflow involves using existing templates, cross-referencing code, and integrating with agentic frameworks like OpenCode via MCP to fetch data from Confluence and Jira. The user currently relies on Opus 3.6 through Kiro-cli but requires a solution that ensures data privacy. Key technical constraints include the necessity for at least 256k context length and strong reasoning capabilities. The poster questions whether hardware such as four RTX 3090 GPUs is necessary to achieve this level of performance locally.

media r/LocalLLaMA · 14h ago

Building a Bash-Based LLM Agent REPL with Minimal Dependencies

A developer created a custom agent REPL loop using exclusively standard command-line building blocks to minimize dependencies. The system relies on pipes, text streams, and append-only logs, aligning closely with classic Unix philosophy. This approach allows for flexible injection of tools to inspect, filter, redirect, and audit various stages of the agent loop. Key features include a plug-and-play backend scoped to a single command-line tool, ensuring portability across different model providers. Agent memory and context are stored in an append-only history file, enabling easy introspection, modification, and rewinding. While tested with an Ollama backend, the design supports any OpenAI-API compatible REST interface. The source code for this project is available on GitHub under the repository name llayer.

arxiv arXiv cs.CL · 19h ago

Weave of Formal Thought: Uniting Rigorous Syntactic Validation with Learned Structural Representations

The authors introduce Weave of Formal Thought (WoFT), a paradigm combining rigorous syntactic validation with learned structural representations for code generation. The approach utilizes a formal engine and constrained decoder that is sound and complete regarding the full Tree-sitter specification. By augmenting generalized LR parsing with speculative lexing, the system maintains concurrent lexer-state hypotheses to admit valid program prefixes while rejecting invalid ones. Additionally, WoFT employs latent-variable fine-tuning to train models to interleave non-terminal grammar symbols directly into the generation process. This method uses the reweighted wake-sleep algorithm to optimize the importance-weighted evidence lower bound of the surface text. The model learns to selectively retain formal derivations as an adaptive structural scratchpad during inference. Experiments on Python show that fine-tuning StarCoder2-3B with this objective reduces per-token cross-entropy by 14.3% compared to a text-only baseline.

media r/LocalLLaMA · 22h ago

Local NL-to-SQL Pipeline Using Qwen3 4B and Deterministic Planning

A developer has implemented a fully local natural language to filter generation system on hardware lacking a GPU. The solution utilizes the Qwen3 4B Instruct model running via llama.cpp with CPU-only inference. Rather than generating SQL directly, the model focuses on semantic intent and structured filter selection. A deterministic query planner subsequently handles the SQL generation and optimization processes. The pipeline employs a BM25 and embedding hybrid retrieval method using FAISS for vector storage. It retrieves the top four matching examples from approximately 800 embedded semantic instances to inject into the prompt. This approach allows the system to function effectively within strict constraints of limited RAM and no internet access.

arxiv arXiv cs.CL · 1d ago

SWE-Pro Benchmark Reveals Significant Gap Between LLMs and Expert Software Optimization

The SWE-Pro benchmark addresses the lack of realistic evaluation frameworks for software performance optimization by introducing a repository-level dataset derived from 102 expert-written optimizations. Unlike previous benchmarks that oversimplify tasks, SWE-Pro pairs each task with parameterized tests to evaluate runtime, peak memory, and Time-Weighted Memory Usage under noise-aware conditions. The study reveals that current Large Language Models struggle significantly with these complex requirements, showing negligible runtime gains and nearly non-existent memory optimizations. In sharp contrast, expert implementations achieved an aggregate speedup of 15.5x and a peak memory reduction of 171.3x across the benchmark tasks. Expert-written improvements were observed in 91.2% of tasks for runtime and 65.7% for peak memory. These findings expose a substantial gap between current LLM capabilities and the demands of expert-level engineering.

media r/LocalLLaMA · 1d ago

I reverse engineered Windows Copilot into a free OpenAI-compatible API

A user has created a local API that replicates OpenAI-compatible GPT-4 functionality using Microsoft's free Copilot service. The tool logs into a Microsoft account once, runs locally on a Windows device, and exposes a server at http://localhost:8000/v1 that supports streaming and multi-turn conversations without requiring an API key or billing. It is designed for personal and educational use, and available via GitHub at https://github.com/sums001/Windows-Copilot-API.