Code generation — korshunov.ai

Code generation Page 1 / 14

I built a novel triple-hybrid LLM under 1B parameters for ~$50

Mateusz has developed a full pre-trained language model, Project Inkblot's Titan v1, combining Mamba SSM, Multi-Head Attention, and 32-expert MoE in a single decoder-only architecture under 1B parameters. The model, trained on a single NVIDIA L4 GPU for ~$50, achieves 27.5 validation perplexity and demonstrates efficient scaling via a single-line config update, with all components implemented from scratch in PyTorch. Titan v2's first training cycle is now complete, and dataset expansion is underway.

media Hugging Face Forums · 3d ago

ML Surrogate Models in CFD/FEA: Real-World Practices and Challenges

Engineering practitioners report that graph neural networks and MLPs on parameterized designs offer the best practical balance for predicting fields like temperature and stress. Data efficiency is achievable with 10–50 training samples, especially when transfer learning is applied across similar geometries. Physics-informed neural networks (PINNs) remain largely experimental for complex engineering geometries, with most users relying on data-driven surrogates. Generalization remains a key challenge, with models often failing on out-of-distribution boundary conditions, prompting a return to full solver runs.

media r/LocalLLaMA · 3d ago

Ling and Ring 2.6 Technical Report Releases Trillion-Parameter Models

Ling and Ring 2.6 has released base models for Ling-2.6-1T and Ling-2.6-flash, both available on Hugging Face. The Ling-2.6-flash model (100B parameters) enables fast inference for users with 24-32GB VRAM, offering high throughput on CPU-only inference with 32GB RAM.

media MarkTechPost · 3d ago

The 7 Types of Agent Memory: A Technical Guide

Large language models are stateless by default, requiring memory mechanisms to retain context across interactions. The seven types of agent memory—working, semantic, episodic, procedural, retrieval, parametric, and prospective—categorize memory by form and duration, enabling agents to plan, learn, and act over time. Each type serves distinct use cases, from storing user preferences to scheduling future goals, and together they form a comprehensive system for long-horizon, context-aware AI agents.

media MarkTechPost · 3d ago

Tutorial on Building Python-First Interactive Dashboards with Prefab

This tutorial demonstrates how to create interactive dashboards in Python using Prefab's component-based UI framework. It generates synthetic pipeline data, integrates reactive controls like charts, forms, and tabs, and exports the app as a static HTML file for direct preview in Google Colab.

media Hugging Face Forums · 3d ago

The Clockwork Dark: A Local-First AI Narrative-RPG Engine

The Clockwork Dark is a local-first, AI-driven narrative-RPG engine that uses a deterministic state machine to resolve all game mechanics. It features two autonomous LLMs that narrate the story, with one acting as a patient world voice and the other as an unreliable, godlike assistant. The game offers players a choice: fight the encroaching supernatural corruption or embrace a quiet life in a bakery, with both paths considered valid endings.

lab OpenAI News · 3d ago

Samsung Deploys ChatGPT and Codex for Employees

Samsung Electronics has rolled out OpenAI's ChatGPT Enterprise and Codex to its global workforce. This deployment represents one of OpenAI's largest enterprise AI initiatives to date.

media r/LocalLLaMA · 4d ago

Qwen 27B for planning, Qwen 35B-A3B for execution

A user explores using Qwen 27B for long-horizon task planning and Qwen 35B-A3B for rapid execution, noting the 27B runs at 7-10 tokens per second and the 35B-A3B at ~18 tokens per second. The user considers switching between models to leverage their different strengths, though currently uses the 35B-A3B exclusively and questions whether the intelligence gap between models is significant.

media r/LocalLLaMA · 4d ago

Updated Vision Model Benchmark Results and Recommendations

A revised benchmark of local vision language models evaluates 23 models across 30 images with 3 tests each, totaling 2,070 tests and 60 to 70 inference hours. The top-performing model is Qwen3.6 27B (nothink) at Q4 with a 79.6 score, followed by Qwen3.5 4B (nothink) at Q4, and Qwen3-VL 8B at Q8. Key findings include thinking mode degrading vision performance, MoE models underperforming compared to dense models, and Q8 quantization not universally improving results.

media r/LocalLLaMA · 4d ago

I pretrained and post trained a 500M parameter LLM and 330M parameter Image generator from scratch

The author pretrained a 500M parameter language model and a 330M parameter image generator from scratch using 40B tokens from fineweb. The image generator was inspired by ByteDance's DreamLite architecture and trained on a mixture of datasets from MidJourney, Flux, and CCW3.

media AI News (smol.ai) · 4d ago

GLM-5.2 Breakout and Open-Model Progress Highlighted

Zhipu's GLM-5.2 emerged as the top open-weight model, praised for its frontier-adjacent performance in daily use, with improvements in coding tasks and reduced 1M-token inference cost via IndexShare. It outperformed other open models in agentic knowledge work benchmarks, reaching 1266 Elo in Artificial Analysis' AA-Briefcase test, though only 3% of tasks were fully satisfied by top models, indicating persistent challenges in real-world long-horizon agent performance.

lab Hugging Face Blog · 4d ago

Can You Beat LoRA in Fine-Tuning?

A new study explores alternatives to LoRA, the most popular fine-tuning technique, assessing whether other methods can achieve better performance with less computational cost. The research finds that while some approaches show promise, none consistently outperform LoRA across diverse tasks and datasets.

lab OpenAI News · 4d ago

GPT-5.5 Instant Enhances ChatGPT's Health Responses

GPT-5.5 Instant improves ChatGPT's health and wellness responses through stronger reasoning, better context handling, clearer communication, and physician-informed evaluations.

media AI News (smol.ai) · 4d ago

GLM-5.2 Emerges as Leading Open-Weight Coding Model

GLM-5.2 is widely regarded as the first open-weight coding model that rivals frontier models like Opus 4.8 and GPT-5.5 in capability. Practitioners highlight its strong tool use, long-horizon planning, and autonomous subagent behavior, with consensus that it now credibly operates in the frontier SWE range. The model's emergence underscores growing value of open weights for provider competition, on-prem deployment, and reduced vendor lock-in.

media r/LocalLLaMA · 4d ago

2× Radeon R9700 with Qwen 3.6 27B Q8 MTP on llama.cpp

A user reports running Qwen 3.6 27B MTP model on two Radeon R9700 GPUs via llama.cpp with ROCm 7.2.1. Tests show stable decode speeds (40–67 t/s) and prefill throughput (up to 1,500 t/s for prompts under 10k tokens), with MTP draft acceptance rates between 0.33 and 0.61.

media r/LocalLLaMA · 4d ago

Can I realistically get close to Claude/Codex capabilities locally?

A user with a 32GB system asks if open-weight models can match Opus 4.8's 1M context and coding performance on local hardware. They note current bottlenecks are context length and privacy concerns, and question whether high-end models like GLM 5.2 or Qwen3.7 are feasible within a $3.5K budget, emphasizing that running 70-80B models offers marginal real-world gains over 27B models with 256K context.

media r/LocalLLaMA · 4d ago

Sandboxing code execution for AI agents

A discussion on effective sandboxing methods for AI agents executing arbitrary code, evaluating Docker containers, microVMs, WASM, and host-level execution. The post highlights requirements for isolation, fast startup, network access control, and persistent filesystem support across executions, while asking for shared implementations and accepted tradeoffs.

media r/LocalLLaMA · 4d ago

Proposal for splitting base models to avoid retraining

A proposal suggests splitting model architecture into a stable base model and lightweight, swappable worker models. The base model handles core reasoning and acts as a platform, while worker models provide domain-specific knowledge through runtime hot-plugging, similar to LoRA but for knowledge rather than behavior.

media r/LocalLLaMA · 4d ago

Watch local LLMs escape the rooms you design

A new tool allows users to design escape room-style environments and watch local LLMs navigate and escape using simple actions. The project, built for Hugging Face x Gradio's 'Build Small' hackathon, supports five model presets and enables custom map creation with font-based visuals and JSON import/export. It uses a 'Think then Act' framework to enable small models to perform reliably in structured game environments.

media r/LocalLLaMA · 4d ago

GLM-5.2 Beats Gemini and GPT-5.4 in Coding but Is Inefficient

GLM-5.2 surpasses GPT-5.4 and the entire Gemini lineup in coding performance on the DeepSWE benchmark. However, it requires significantly more output tokens, making it substantially less efficient in terms of cost-per-task compared to models like GPT-5.5 and Claude Opus 4.8.