AI agents — korshunov.ai

AI agents Page 6 / 20

Microsoft Releases Open Source FastContext for LLM Coding Agents

Microsoft has open-sourced FastContext-1.0, a lightweight repository-exploration subagent that separates code repository exploration from task solving in LLM coding agents. It uses parallel read-only tool calls to return compact file paths and line ranges, improving end-to-end accuracy and reducing token usage by up to 60.3%, with the 4B-RL model outperforming a 30B-SFT model on SWE-bench Pro.

media Latent Space · 3d ago

AI Red Teaming and Prompt Injection Risks Explained

Zico Kolter and Matt Fredrikson, co-authors of the definitive paper on indirect prompt injections and authorities on the Mythos model, discuss the growing risks of AI security. They highlight that AI systems require a distinct security mindset, with agents introducing new vulnerabilities, and that specialized red-teaming AI can outperform humans in breaking models, making AI prompt injection breaches increasingly likely.

lab Claude Code Releases · 3d ago

Claude v2.1.186 Release Notes

Claude v2.1.186 adds CLI authentication commands for MCP servers, status filtering in workflows, and a "Skills" section in plugin settings. It includes numerous bug fixes for UI, session management, and agent behavior, along with improvements to YAML parsing, memory handling, and tool validation.

media MarkTechPost · 3d ago

Sakana AI Launches Sakana Fugu: Multi-Agent Orchestration Model

Sakana AI has launched Sakana Fugu, an orchestration model that routes tasks across a swappable pool of frontier LLMs via a single OpenAI-compatible API. Fugu Ultra outperforms individual models on key benchmarks like SWE Bench Pro and GPQA-D, and the system demonstrates superior performance on complex, multi-step tasks such as auto-research, Rubik's Cube solving, and blindfold chess.

media r/LocalLLaMA · 3d ago

TMax: A Simple Recipe for Terminal Agents

TMax introduces TMax-15k, a dataset of 14,600 RL environments, over 2.5× larger than the next-largest open terminal dataset. It also presents a simple RL recipe that trains open models from 2B to 27B parameters, with TMax-9B achieving 27.2% on Terminal Bench 2.0 and TMax-27B reaching 42.7%.

media r/LocalLLaMA · 3d ago

Same model, same prompt, 4 different agents produce varied code quality

A self-hosted Qwen3.6-27B model with identical prompt and hardware generated four different HTML/JavaScript solar system simulations. The agent scaffolding significantly influenced output: opencode produced clean, stable code with accurate physics; pi showed robustness and coordinate consistency; hermes offered visually appealing but physically flawed results; qwen code generated minimal, crude code. The results highlight how agent design shapes code quality, correctness, and stability despite shared model and prompt.

media Interconnects · 3d ago

GLM-5.2 is the step change for open agents

GLM-5.2, an open-weight AI model released by Z.ai, has set a new benchmark in coding and general agent performance. It outperforms models like Claude Fable 5 and Gemini, and matches or exceeds OpenAI's Opus 4.8 in max thinking mode, establishing itself as the first open model that feels right in coding harnesses as a general agent.

media r/LocalLLaMA · 3d ago

I Built a Tool to Stop Manually Swapping Models on My 8GB GPU

I developed Prompt-Chain, a Streamlit app that chains a small Prompter model with a large Coder model into a single pipeline. It automatically swaps VRAM when transitioning from prompt refinement to code generation, eliminating manual model switching and reducing wasted tokens from poorly worded prompts.

media r/LocalLLaMA · 3d ago

Ling and Ring 2.6 Technical Report Releases Trillion-Parameter Models

Ling and Ring 2.6 has released base models for Ling-2.6-1T and Ling-2.6-flash, both available on Hugging Face. The Ling-2.6-flash model (100B parameters) enables fast inference for users with 24-32GB VRAM, offering high throughput on CPU-only inference with 32GB RAM.

media MarkTechPost · 3d ago

The 7 Types of Agent Memory: A Technical Guide

Large language models are stateless by default, requiring memory mechanisms to retain context across interactions. The seven types of agent memory—working, semantic, episodic, procedural, retrieval, parametric, and prospective—categorize memory by form and duration, enabling agents to plan, learn, and act over time. Each type serves distinct use cases, from storing user preferences to scheduling future goals, and together they form a comprehensive system for long-horizon, context-aware AI agents.

media Hugging Face Forums · 3d ago

The Clockwork Dark: A Local-First AI Narrative-RPG Engine

The Clockwork Dark is a local-first, AI-driven narrative-RPG engine that uses a deterministic state machine to resolve all game mechanics. It features two autonomous LLMs that narrate the story, with one acting as a patient world voice and the other as an unreliable, godlike assistant. The game offers players a choice: fight the encroaching supernatural corruption or embrace a quiet life in a bakery, with both paths considered valid endings.

media AI News (smol.ai) · 4d ago

GLM-5.2 Breakout and Open-Model Progress Highlighted

Zhipu's GLM-5.2 emerged as the top open-weight model, praised for its frontier-adjacent performance in daily use, with improvements in coding tasks and reduced 1M-token inference cost via IndexShare. It outperformed other open models in agentic knowledge work benchmarks, reaching 1266 Elo in Artificial Analysis' AA-Briefcase test, though only 3% of tasks were fully satisfied by top models, indicating persistent challenges in real-world long-horizon agent performance.

lab Google DeepMind Blog · 4d ago

AI Control Roadmap for Internal System Security

An AI Control Roadmap has been introduced to secure internal systems by integrating traditional safeguards with real-time monitoring capabilities.

media AI News (smol.ai) · 4d ago

GLM-5.2 Emerges as Leading Open-Weight Coding Model

GLM-5.2 is widely regarded as the first open-weight coding model that rivals frontier models like Opus 4.8 and GPT-5.5 in capability. Practitioners highlight its strong tool use, long-horizon planning, and autonomous subagent behavior, with consensus that it now credibly operates in the frontier SWE range. The model's emergence underscores growing value of open weights for provider competition, on-prem deployment, and reduced vendor lock-in.

lab NVIDIA Technical Blog · 4d ago

NVIDIA Launches XR AI for AR Glasses and Wearable Devices

NVIDIA introduces XR AI to bridge the infrastructure gap for developers building AI experiences on AR glasses and XR devices. The solution enables integration of live sensor streams, multimodal AI models, and enterprise data within device-specific runtimes, streamlining AI agent development for wearables.

media r/LocalLLaMA · 4d ago

Sandboxing code execution for AI agents

A discussion on effective sandboxing methods for AI agents executing arbitrary code, evaluating Docker containers, microVMs, WASM, and host-level execution. The post highlights requirements for isolation, fast startup, network access control, and persistent filesystem support across executions, while asking for shared implementations and accepted tradeoffs.

media r/LocalLLaMA · 4d ago

I mapped every agent config file and tagged real adoption

A guide lists 21 agent configuration conventions across 11 categories, tagged as adopted, emerging, or proposed. The guide includes real examples from public repositories and explicitly notes hype, such as llms.txt being widely published but unconfirmed by major providers.

media r/LocalLLaMA · 4d ago

Proposal for splitting base models to avoid retraining

A proposal suggests splitting model architecture into a stable base model and lightweight, swappable worker models. The base model handles core reasoning and acts as a platform, while worker models provide domain-specific knowledge through runtime hot-plugging, similar to LoRA but for knowledge rather than behavior.

media r/LocalLLaMA · 4d ago

Watch local LLMs escape the rooms you design

A new tool allows users to design escape room-style environments and watch local LLMs navigate and escape using simple actions. The project, built for Hugging Face x Gradio's 'Build Small' hackathon, supports five model presets and enables custom map creation with font-based visuals and JSON import/export. It uses a 'Think then Act' framework to enable small models to perform reliably in structured game environments.

media r/LocalLLaMA · 4d ago

AllenAI releases MolmoMotion vision models for future motion prediction

AllenAI has released two MolmoMotion models that predict 3D point trajectories based on short video histories and natural-language instructions. One model uses a three-frame history, the other a one-frame history, enabling future motion forecasting for objects in 3D space.