AI agents
arxiv arXiv cs.AI · 18h ago

Grounded Scaling: Determinism as a Core Limit in Agentic AI

Agentic AI performance degrades exponentially in non-deterministic environments, with k-step success falling as δ^k when per-step determinism δ < 1. The paper introduces a framework linking environment determinism to task success, verifiability, and skill evolution, proposing a Supply Certainty Index and a five-level Determinism Maturity Model. It challenges prevailing views by identifying determinism as a binding constraint across compute, data, embodiment, and alignment.

arxiv arXiv cs.LG · 22h ago

DataClaw0: Agentic Tailoring of Multimodal Data from Raw Streams

DataClaw0 introduces an agentic paradigm for actively refining raw multimodal data to align with user and downstream intents. It uses a two-stage pipeline grounded in factual anchors to generate a large-scale dataset across five domains and combines supervised fine-tuning with GRPO to achieve strong alignment with complex refinement tasks. Evaluated on video generation, VQA, and GUI navigation, DataClaw0 produces high-information-density tailored data, enabling efficient model adaptation with minimal training data.

arxiv arXiv cs.LG · 22h ago

Neural Action Codec for Vision-Language-Action Models

NAC, a neural audio codec-inspired architecture, compresses robot action trajectories as multi-channel 1D signals using multi-scale residual vector quantization. By replacing mel-spectrogram losses with time-domain and non-mel spectral reconstruction, NAC achieves high-fidelity action encoding with minimal architectural changes, outperforming existing tokenizers in reconstruction error and success rates on real-world manipulation tasks.

arxiv arXiv cs.LG · 23h ago

VLA-FAIL: Lightweight Failure Detection for Vision-Language-Action Models

VLA-FAIL introduces a lightweight, failure detection framework for vision-language-action models that uses last-layer Mahalanobis distance and action chunk consistency without requiring failure data or expensive action sampling. The framework combines these detectors to achieve reliable, early failure detection across diverse tasks, outperforming baseline methods in both accuracy and efficiency.

arxiv arXiv cs.CL · 1d ago

Micro-Transaction Markets for Verified Product Info in Agentic E-Commerce

Autonomous agents in e-commerce face a scarcity of trustworthy product information, not product matching. A proposed micro-transaction model allows agents to pay fractions of a cent to access verified data like service histories and test reports, with pricing and trust scored via reputation. This system prioritizes genuine product quality and real-time information acquisition over chatbot fluency.

arxiv arXiv cs.CL · 1d ago

Metis: Bridging Text and Code Memory for Self-Evolving Agents

Metis introduces a hierarchical dual-representation memory that combines text and code memory to improve self-evolving agents. It organizes experience into execution plans, facts, and pitfalls, crystallizing reusable plans into validated tools only when justified. Evaluated on AppWorld, Metis achieves up to 20.6% higher task accuracy and 22.8% lower execution cost than ReAct, with better overall balance across accuracy, efficiency, and memory cost.