AI agents — korshunov.ai

AI agents Page 1 / 20

Research Project: Injecting Natural-Language Tactical Intent into Multi-Agent Football Policies

A research project explores using natural-language tactical instructions from humans to guide autonomous AI agents in a football simulation. The system enables human coaches to issue high-level directives like 'press aggressively' or 'exploit the left side', which the AI agents then adapt to in real time within a dynamic, team-based environment.

media r/LocalLLaMA · 5d ago

Local AI for Local Office Files

A Reddit user asks which AI agent is best for handling local office files like Excel, PDF, Word, and JSON. The post seeks user experiences and implemented workflows for such tasks.

media r/LocalLLaMA · 5d ago

Tool calling issue in open-source Qwen3.6 27B 8K

Users report that the Qwen3.6 27B 8K model occasionally stops processing after generating a tool call, especially when the user steps away. The issue can be resolved by manually pasting the tool call back into the prompt, allowing the model to resume execution. The tool call involves a bash function to find passing tests in a codebase.

media r/LocalLLaMA · 5d ago

Local Agent Web Access via SearXNG and Scrapling

A local agent can access the web without paid APIs by using self-hosted SearXNG for search and Scrapling with Trafilatura for page extraction. The setup avoids vendor dependencies, uses open-source tools, and delivers search results and page content in Markdown format, with fallbacks for CAPTCHAs and security challenges.

media r/LocalLLaMA · 5d ago

Local agent on 4090 - looking for LM Studio settings

A user reports slow token generation when running a local agent on a 4090 with 24GB VRAM, despite adjusting context and batching settings. They note Gemma4 performs faster but produces incorrect tokens like <code></tool_call></code>, and seek recommended settings and explanations for parameters such as top_p and top_k.

blog Simon Willison · 5d ago

Sean Lynch on MCP's Auth Flow Isolation

Sean Lynch highlights that the Model Context Protocol (MCP) offers a key advantage by isolating authentication flows outside the agent's context window. He suggests the ideal form of MCP could be a simple auth gateway for APIs, which would still represent a significant improvement.

media r/LocalLLaMA · 5d ago

Best Local Agents - Jun 2026

A discussion thread identifies the best local AI agents available today, emphasizing open-weight models and local hardware execution. The post defines 'agents' as autonomous software that self-determines actions without pre-programming, distinguishing them from tools like IFTTT or Apple Shortcuts, and sets rules requiring local deployment and open-source agent software as a primary focus.

media r/LocalLLaMA · 5d ago

Help Running Local Hermes Agent with llama-cpp

A user reports issues running a local Hermes AI agent on a high-end rig using self-compiled llama-cpp. The setup experiences frequent KV cache reprocessing every 5 messages and slow reasoning, with the agent repeatedly pausing to report progress instead of continuing autonomously. The user seeks guidance on whether their llama-cpp parameters are incorrect or what adjustments can improve agent performance and sustained reasoning without interruptions.

media r/LocalLLaMA · 5d ago

Improving local models with an API-based consultant agent

A user asks whether adding a powerful API-based 'consultant' agent, such as GLM 5.2, could enhance local AI workflows by refining plans and learning processes. The post explores the potential benefits of such an agent in improving local model performance through external consultation.

media r/LocalLLaMA · 6d ago

Best Harness for Web Searching

Users report that tools like LM Studio and Odysseus are limited by search engine request caps, often at 10 per day or hour, without API access. They suggest creating DuckDuckGo API accounts for better search access, but note that frontends rarely prompt for this. The post asks whether Hermes or Pi offer improved solutions.

media r/LocalLLaMA · 6d ago

Watching a Local AI Voice Assistant Get Dumber

A test on an RTX 5060 Ti showed that reducing a local AI voice assistant's model size from 9B to 0.8B leads to a sharp decline in capability. The 9B model handles tool orchestration well, while smaller models show increasing failures: the 4B model skips tool calls and guesses facts, the 2B model suffers semantic drift, and the 0.8B model fails to operate agent functions, triggering wrong APIs or infinite loops.

media r/LocalLLaMA · 6d ago

New Agentic Benchmark Released

Artificial Analysis has introduced a new agentic benchmark that evaluates large language models' ability to plan and execute tasks. Claude Fable and GLM 5.2 achieved top positions within their respective cohorts, demonstrating strong performance on this unsaturated benchmark.

media r/LocalLLaMA · 6d ago

Multi Doc Agent Workflows in Word

A blog post details how to implement multi-document agent workflows in Microsoft Word using local LLMs. The guide outlines steps to enable agents to process and interact with multiple documents within a single Word environment.

media r/LocalLLaMA · 6d ago

Ohio State University releases open-source Deep Research agent QUEST-35B

Ohio State University's NLP team has released QUEST-35B, an open-source Deep Research agent trained on approximately 32 H100 GPUs using 8,000 synthetic samples. The team open-sourced the training recipe, code, weights, and datasets, with benchmark results showing competitive performance compared to leading closed-source Deep Research systems.

media r/LocalLLaMA · 6d ago

Ohio State University releases open-source Deep Research agent QUEST-35B

Researchers at Ohio State University trained QUEST-35B, a Deep Research agent, using approximately 32 H100 GPUs and 8,000 synthetic samples. They open-sourced the training recipe, code, weights, and datasets, with benchmark results showing competitive performance compared to leading closed-source Deep Research systems.

github OpenAI Agents SDK · 6d ago

v0.17.6 Release Notes

The v0.17.6 release adds pre-approval tool input guardrails and SDK-only custom data for tool outputs. It also enforces a strict JSON-compatible contract for tool outputs and suppresses unnecessary whitespace warnings in tool names. @siddiksawani made their first contribution in this release.

arxiv arXiv cs.AI · 6d ago

DataMagic Turns Tabular Data into Interactive Insight Videos

DataMagic transforms raw tabular data and natural language queries into narrative data-insight videos. It uses DVSpec to ensure data fidelity by linking visual elements to data fields via semantic references, and employs a multi-agent architecture to generate and orchestrate coherent video scenes. The system supports interactive exploration and provenance-based data Q&A, enabling users to engage with data beyond static views.

arxiv arXiv cs.AI · 6d ago

NRT-Bench: Multi-turn Red-teaming of LLM Agents in Safety-Critical Systems

NRT-Bench introduces a benchmark for multi-turn red-teaming of LLM agents operating in a simulated nuclear power plant. Across four frontier operator models, 8.7% to 12.1% of attack sessions result in loss of a critical safety function, with vulnerabilities largely disjoint across models. The effectiveness of defences varies significantly by model, showing strong model dependence.

arxiv arXiv cs.AI · 6d ago

Defensive Misdirection Against Automated Attacks on Agentic AI

Agentic AI systems face growing threats from model-guided automated attacks. A new defense strategy, Contextual Misdirection via Progressive Engagement (CMPE), reduces attacker success rates by up to two orders of magnitude and nearly eliminates verified attack success in benchmark tests.

arxiv arXiv cs.AI · 6d ago

UltraQuant: 4-bit KV Caching for Context-Heavy Agents

UltraQuant enables 4-bit KV caching for context-heavy agents, reducing P50 time-to-first-token by 3.47x in late rounds and boosting output throughput by 1.63x over FP8 KV baseline. It achieves this using FP8 queries, FP4 KV tensors, UE8M0 group scales, and native scaled-MFMA on AMD CDNA4 GPUs, with optimizations for decode-attention kernels and robust design choices like asymmetric K/V treatment and Walsh-Hadamard rotation.