AI agents — korshunov.ai

AI agents Page 1 / 20

Aiden Mobile Agent Prototype in the Making

Aiden is a physical AI agent device that monitors a phone's screen via HDMI and controls it through USB HID, enabling app automation without jailbreak or installed software. It supports bring-your-own LLMs, operates without backend infrastructure or data collection, and is released under the AGPL license as an open-source development board.

arxiv arXiv cs.AI · 18h ago

Grounded Scaling: Determinism as a Core Limit in Agentic AI

Agentic AI performance degrades exponentially in non-deterministic environments, with k-step success falling as δ^k when per-step determinism δ < 1. The paper introduces a framework linking environment determinism to task success, verifiability, and skill evolution, proposing a Supply Certainty Index and a five-level Determinism Maturity Model. It challenges prevailing views by identifying determinism as a binding constraint across compute, data, embodiment, and alignment.

arxiv arXiv cs.AI · 19h ago

Gazer: Training-Free Semantic Correction for Autoregressive Visual Models

Gazer introduces a training-free framework that uses multimodal large language model feedback to correct semantic errors in real time during autoregressive visual model generation. By integrating reflective diagnosis and semantic correction stages, Gazer improves compositional accuracy and semantic alignment across multiple models without additional training.

arxiv arXiv cs.AI · 19h ago

MacAgentBench Launches macOS AI Agent Benchmark

MacAgentBench introduces a comprehensive benchmark with 676 tasks across 25 applications, 60% of which involve both GUI and CLI interactions. It uses deterministic rule-based evaluation and fine-grained multi-checkpoint scoring, revealing that Claude Opus 4.6 on OpenClaw achieves 73.7% Pass@1, primarily due to its skill library rather than framework design.

media r/LocalLLaMA · 19h ago

Nex-N2-Mini-Ultra-Uncensored-Heretic Model Released

The Nex-N2-Mini-Ultra-Uncensored-Heretic model is now available, featuring agentic thinking with 5/100 refusals and a KLD of 0.0020. It is released in both Safetensors and GGUF formats and is accessible via Hugging Face. The creator notes that Heretic 1.2.0 was chosen over 1.4.0 due to better performance in avoiding high KLD and maintaining low refusal thresholds.

arxiv arXiv cs.AI · 21h ago

PaperClaw: Autonomous Research with Human-in-the-Loop Refinement

PaperClaw is a multi-agent system that autonomously conducts research from field selection to paper publication. It uses a validated, iterative propose-test-reflect loop, grounded in real references and runnable results, and supports human-in-the-loop refinement at any stage. Evaluation shows it produces strong papers both autonomously and with human oversight.

arxiv arXiv cs.LG · 22h ago

DataClaw0: Agentic Tailoring of Multimodal Data from Raw Streams

DataClaw0 introduces an agentic paradigm for actively refining raw multimodal data to align with user and downstream intents. It uses a two-stage pipeline grounded in factual anchors to generate a large-scale dataset across five domains and combines supervised fine-tuning with GRPO to achieve strong alignment with complex refinement tasks. Evaluated on video generation, VQA, and GUI navigation, DataClaw0 produces high-information-density tailored data, enabling efficient model adaptation with minimal training data.

arxiv arXiv cs.LG · 22h ago

Neural Action Codec for Vision-Language-Action Models

NAC, a neural audio codec-inspired architecture, compresses robot action trajectories as multi-channel 1D signals using multi-scale residual vector quantization. By replacing mel-spectrogram losses with time-domain and non-mel spectral reconstruction, NAC achieves high-fidelity action encoding with minimal architectural changes, outperforming existing tokenizers in reconstruction error and success rates on real-world manipulation tasks.

arxiv arXiv cs.LG · 23h ago

VLA-FAIL: Lightweight Failure Detection for Vision-Language-Action Models

VLA-FAIL introduces a lightweight, failure detection framework for vision-language-action models that uses last-layer Mahalanobis distance and action chunk consistency without requiring failure data or expensive action sampling. The framework combines these detectors to achieve reliable, early failure detection across diverse tasks, outperforming baseline methods in both accuracy and efficiency.

arxiv arXiv cs.LG · 23h ago

LDT-FRL Framework for Cyber-Resilient IoMT

The LDT-FRL framework introduces a privacy-preserving defense system for IoMT devices, combining temporal attention, lightweight digital twins, and federated reinforcement learning. It achieves 99.66% and 99.95% accuracy on CICDDoS 2019 and TON-IoT benchmarks, with perfect F1 on the MITM class, converging 81% faster than prior methods and offering interpretable defense decisions via SHAP and Grad-CAM.

arxiv arXiv cs.LG · 23h ago

ASCII Art Enables Text-Only LLMs to Control VLA Systems

A text-only large language model can be adapted into a Vision--Language--Action controller by using ASCII-rendered visual observations. This approach allows LLMs to interpret visual states through text, enabling them to follow natural-language instructions and generate executable actions in both simulation and on physical manipulators.

arxiv arXiv cs.LG · 1d ago

Decoupling Declarative and Procedural Knowledge in Vision-Language-Action Models

w$^{2}$VLA introduces a modular approach that decouples declarative and procedural knowledge in Vision-Language-Action models. By restructuring information flow, it enables robust behavior cloning and unprecedented zero-shot skill transfer across unseen, dissimilar objects.

media Hugging Face Forums · 1d ago

I Built an MCP Server in Go for AI Agents - 200 Lines Tutorial

A 200-line Go tutorial demonstrates building a lightweight Model Context Protocol server using Go's concurrency and simplicity. The server enables AI agents like Claude to access structured data and Go applications, potentially making them 10x more useful.

media r/LocalLLaMA · 1d ago

Qwen Releases Qwen-AgentWorld-397B-A17B Model

Qwen has announced a new large language model called Qwen-AgentWorld-397B-A17B. The model is mentioned on Hugging Face and Qwen's official blog, indicating its public release and availability for use.

media r/LocalLLaMA · 1d ago

GitHub Repository: Qwen-AgentWorld for Language World Models

Qwen-AgentWorld is a GitHub repository introducing language world models designed for general-purpose agents. The project aims to enable agents with broader, more realistic world understanding through language-based modeling.

media r/LocalLLaMA · 1d ago

Qwen releases 35B-parameter MoE for agent environment simulation

Qwen has launched Qwen-AgentWorld-35B-A3B, a 35B-parameter MoE model with only about 3B active parameters per token. It is trained to simulate responses from MCP, terminal, software engineering, Android, web, and OS GUI environments by predicting next observations after agent actions, enabling efficient agent training and environment simulation without real tool execution.

arxiv arXiv cs.CL · 1d ago

Are We Ready For An Agent-Native Memory System?

A new study decomposes agent memory into four core modules and evaluates 12 systems across five benchmark workloads. It finds no single architecture dominates, with performance dependent on alignment with workload bottlenecks, and reveals that localized maintenance is more cost-efficient than global reorganization.

arxiv arXiv cs.CL · 1d ago

Micro-Transaction Markets for Verified Product Info in Agentic E-Commerce

Autonomous agents in e-commerce face a scarcity of trustworthy product information, not product matching. A proposed micro-transaction model allows agents to pay fractions of a cent to access verified data like service histories and test reports, with pricing and trust scored via reputation. This system prioritizes genuine product quality and real-time information acquisition over chatbot fluency.

arxiv arXiv cs.CL · 1d ago

SHERLOC: Structured Diagnostic Localization for Code Repair Agents

SHERLOC introduces a training-free framework that pairs a reasoning LLM with compact repository tools and self-recovery. It achieves state-of-the-art localization accuracy and recall on SWE-Bench, improving repair agents' resolve rate by 5.95 percentage points while reducing localization and total token usage by 36.7% and 23.1% respectively.

arxiv arXiv cs.CL · 1d ago

Metis: Bridging Text and Code Memory for Self-Evolving Agents

Metis introduces a hierarchical dual-representation memory that combines text and code memory to improve self-evolving agents. It organizes experience into execution plans, facts, and pitfalls, crystallizing reusable plans into validated tools only when justified. Evaluated on AppWorld, Metis achieves up to 20.6% higher task accuracy and 22.8% lower execution cost than ReAct, with better overall balance across accuracy, efficiency, and memory cost.