AI agents — korshunov.ai

Topic · AI agents

Introducing Claude Tag for Slack Teams

Claude Tag allows teams to tag @Claude in Slack to delegate tasks, with access to selected channels, tools, and codebases. It learns from channel context, works asynchronously, and takes initiative by proactively updating users on relevant information. Today, 65% of Anthropic’s product team code is created by internal Claude Tag, and it’s now available in beta for Claude Enterprise and Team customers.

lab Google DeepMind Blog · 9h ago

Gemini 3.5 Flash Adds Computer Use Capability

Google has introduced computer use in Gemini 3.5 Flash, enabling the model to execute code and interact with external tools. This feature allows users to run programming tasks and access real-time information through integrated computing functions.

lab Mistral AI News · 13h ago

New Connector Controls for Enterprise Security and Access

Mistral Studio now offers enriched admin controls to govern connector access per workspace and tool, enabling fine-grained permissions. Features include API keys with scopes, multi-account connectors, and a new Connectors Debugger for root cause analysis, all supporting secure, auditable integration with enterprise systems.

lab OpenAI News · 2d ago

Omio builds AI-native conversational travel

Omio leverages OpenAI to enhance conversational travel experiences. The company uses AI to accelerate product development and transition into an AI-native business model.

lab Claude Code Releases · 1d ago

Claude v2.1.187 Release Notes

Claude v2.1.187 introduces sandbox credentials blocking, org-configured model restrictions, mouse click support in fullscreen, and fixes for command failures, tool hangs, and UI stability. Updates also improve structured output handling, agent depth tracking, and plugin management, with enhancements to VSCode and terminal compatibility.

lab Claude Code Releases · 2d ago

Claude v2.1.186 Release Notes

Claude v2.1.186 adds CLI authentication commands for MCP servers, status filtering in workflows, and a "Skills" section in plugin settings. It includes numerous bug fixes for UI, session management, and agent behavior, along with improvements to YAML parsing, memory handling, and tool validation.

lab Google DeepMind Blog · 3d ago

AI Control Roadmap for Internal System Security

An AI Control Roadmap has been introduced to secure internal systems by integrating traditional safeguards with real-time monitoring capabilities.

github AutoGPT · 6d ago

autogpt-platform-beta-v0.6.64 Released

The autogpt-platform-beta-v0.6.64 release, dated 18th June 2026, introduces new features such as the AutoPilot Context Panel and Global Search, along with enhancements to graph saving, caching, and builder performance. It also includes security hardening, bug fixes for LLM provider issues, and UI improvements like a high-resolution touch icon.

lab Claude Code Releases · 8d ago

Claude v2.1.178 Release Notes

Claude v2.1.178 introduces new permission rules using Tool(param:value) syntax, improved workflow and skill loading in nested directories, and enhanced auto mode and error messaging. It fixes critical issues including crashes, authentication errors, and UI behavior in Chrome and VSCode, while refining tool prompts and undo functionality.

lab Claude Code Releases · 5h ago

Claude Code v2.1.191 Release Notes

Claude Code version 2.1.191 introduces /rewind support, allowing users to resume conversations from before a /clear command was executed. The update fixes several critical issues, including background agents resurrecting after being stopped and scroll position jumping during streaming responses. It also corrects behavior where /voice displayed generic error messages and where /login URLs were truncated in Windows Terminal. Significant improvements enhance reliability for MCP servers by adding retry logic for transient network errors during capability discovery and OAuth flows. Headless environments now skip browser popups for OAuth, while sandbox network permissions are remembered for the session duration. Performance optimizations reduce CPU usage during streaming by approximately 37% through text update coalescing and mitigate long-session memory growth from the terminal output cache.

github CrewAI · 1d ago

CrewAI 1.14.8a3 Release Notes

CrewAI 1.14.8a3 introduces unified declarative flow loading and improved startup UX for crew runs. It consolidates crewai run and flow kickoff commands, adds declarative Flow CLI support, and enables @router() as a flow start method with typed output schemas for tools.

lab Hugging Face Blog · 2d ago

Build Real Agentic Apps with CUGA: 24 Working Examples

CUGA introduces a lightweight harness enabling developers to build real agentic applications. It includes 24 working examples demonstrating practical implementations across various use cases.

arxiv arXiv cs.CL · 2d ago

OpenBioRQ: Benchmark for Agentic Biomedical Research Faithfulness

OpenBioRQ introduces a benchmark of 12,553 unsolved biomedical research questions across 12 domains, designed to test agentic models' faithfulness and abstention. It evaluates models in a tool-using setting without answer keys, using real follow-up evidence rather than parametric knowledge, and reveals significant agentic collapse on the hardest questions where tools are no longer used despite being critical.

arxiv arXiv cs.CL · 2d ago

LRE: Few-Kilobytes Agent Memory with Zero Neural Cost

LRE is a CPU-only, language-model-free system that learns which interaction history units are load-bearing. It outperforms baselines in accuracy-cost balance, reducing peak context size by up to 52% and improving task completion by 37% in some cases. LRE achieves superior answer quality with 68% fewer tokens and requires no annotations or neural computation for training.

arxiv arXiv cs.CL · 2d ago

Beaver: Agent Harness for Scientific Curation from Multimodal Sources

Beaver is an agent harness that extracts structured information from scientific papers by integrating multimodal evidence tooling, task scaffolding, and artifact-grounded autoresearch. It achieves 81.0 on the Gold-Referenced Attribute Score, outperforming frontier agents by over 23 points, with key gains on high-value attributes requiring cross-modal reasoning.

media MarkTechPost · 2d ago

Sakana AI Launches Sakana Fugu: Multi-Agent Orchestration Model

Sakana AI has launched Sakana Fugu, an orchestration model that routes tasks across a swappable pool of frontier LLMs via a single OpenAI-compatible API. Fugu Ultra outperforms individual models on key benchmarks like SWE Bench Pro and GPQA-D, and the system demonstrates superior performance on complex, multi-step tasks such as auto-research, Rubik's Cube solving, and blindfold chess.

github OpenAI Agents SDK · 5d ago

v0.17.6 Release Notes

The v0.17.6 release adds pre-approval tool input guardrails and SDK-only custom data for tool outputs. It also enforces a strict JSON-compatible contract for tool outputs and suppresses unnecessary whitespace warnings in tool names. @siddiksawani made their first contribution in this release.

arxiv arXiv cs.AI · 6d ago

NRT-Bench: Multi-turn Red-teaming of LLM Agents in Safety-Critical Systems

NRT-Bench introduces a benchmark for multi-turn red-teaming of LLM agents operating in a simulated nuclear power plant. Across four frontier operator models, 8.7% to 12.1% of attack sessions result in loss of a critical safety function, with vulnerabilities largely disjoint across models. The effectiveness of defences varies significantly by model, showing strong model dependence.

arxiv arXiv cs.AI · 6d ago

Defensive Misdirection Against Automated Attacks on Agentic AI

Agentic AI systems face growing threats from model-guided automated attacks. A new defense strategy, Contextual Misdirection via Progressive Engagement (CMPE), reduces attacker success rates by up to two orders of magnitude and nearly eliminates verified attack success in benchmark tests.

arxiv arXiv cs.AI · 6d ago

UltraQuant: 4-bit KV Caching for Context-Heavy Agents

UltraQuant enables 4-bit KV caching for context-heavy agents, reducing P50 time-to-first-token by 3.47x in late rounds and boosting output throughput by 1.63x over FP8 KV baseline. It achieves this using FP8 queries, FP4 KV tensors, UE8M0 group scales, and native scaled-MFMA on AMD CDNA4 GPUs, with optimizations for decode-attention kernels and robust design choices like asymmetric K/V treatment and Walsh-Hadamard rotation.