AI agents
arxiv arXiv cs.AI · 19h ago

Self-Evolving Cognitive Framework for Embodied Scientific Intelligence

The paper proposes a self-evolving cognitive framework that uses causal world modeling to enable embodied systems to continuously refine their internal models through interaction. It integrates causal modeling, intervention-driven reasoning, and continual refinement, redefining embodied interaction as an epistemic process for causal discovery and knowledge acquisition. The framework supports a shift from predictive to epistemic intelligence, with a new benchmark for evaluating self-evolving embodied scientific intelligence.

arxiv arXiv cs.AI · 20h ago

LLM-Orchestrated Agent for SOI Directional Coupler Design

A large language model orchestrates the design of a silicon-on-insulator 2x2 directional coupler by proposing gap values and assessing convergence. The design is validated through eigenmode and FDTD simulations on a common 2D effective-index model, showing a consistent phase offset of 2.837(11) micrometers that is corrected in a closed-loop process. The final device achieves a 50/50 split with a cross fraction of 0.498, within 0.0017 of the target.

arxiv arXiv cs.AI · 21h ago

Grounded Scaling: Determinism as a Core Limit in Agentic AI

Agentic AI performance degrades exponentially in non-deterministic environments, with k-step success falling as δ^k when per-step determinism δ < 1. The paper introduces a framework linking environment determinism to task success, verifiability, and skill evolution, proposing a Supply Certainty Index and a five-level Determinism Maturity Model. It challenges prevailing views by identifying determinism as a binding constraint across compute, data, embodiment, and alignment.

arxiv arXiv cs.LG · 1d ago

DataClaw0: Agentic Tailoring of Multimodal Data from Raw Streams

DataClaw0 introduces an agentic paradigm for actively refining raw multimodal data to align with user and downstream intents. It uses a two-stage pipeline grounded in factual anchors to generate a large-scale dataset across five domains and combines supervised fine-tuning with GRPO to achieve strong alignment with complex refinement tasks. Evaluated on video generation, VQA, and GUI navigation, DataClaw0 produces high-information-density tailored data, enabling efficient model adaptation with minimal training data.

arxiv arXiv cs.LG · 1d ago

Neural Action Codec for Vision-Language-Action Models

NAC, a neural audio codec-inspired architecture, compresses robot action trajectories as multi-channel 1D signals using multi-scale residual vector quantization. By replacing mel-spectrogram losses with time-domain and non-mel spectral reconstruction, NAC achieves high-fidelity action encoding with minimal architectural changes, outperforming existing tokenizers in reconstruction error and success rates on real-world manipulation tasks.

arxiv arXiv cs.LG · 1d ago

VLA-FAIL: Lightweight Failure Detection for Vision-Language-Action Models

VLA-FAIL introduces a lightweight, failure detection framework for vision-language-action models that uses last-layer Mahalanobis distance and action chunk consistency without requiring failure data or expensive action sampling. The framework combines these detectors to achieve reliable, early failure detection across diverse tasks, outperforming baseline methods in both accuracy and efficiency.