AI agents
lab Claude Code Releases · 21h ago

Claude Code v2.1.191 Release Notes

Claude Code version 2.1.191 introduces /rewind support, allowing users to resume conversations from before a /clear command was executed. The update fixes several critical issues, including background agents resurrecting after being stopped and scroll position jumping during streaming responses. It also corrects behavior where /voice displayed generic error messages and where /login URLs were truncated in Windows Terminal. Significant improvements enhance reliability for MCP servers by adding retry logic for transient network errors during capability discovery and OAuth flows. Headless environments now skip browser popups for OAuth, while sandbox network permissions are remembered for the session duration. Performance optimizations reduce CPU usage during streaming by approximately 37% through text update coalescing and mitigate long-session memory growth from the terminal output cache.

arxiv arXiv cs.AI · 1d ago

MetaPS: Adaptive Strategy Selection for Market Agents

MetaPS is a simulation-guided framework that enables market agents to adaptively select among programmatic strategies based on market states. It uses simulated markets to generate supervised training data, then selects strategies during inference to produce executable actions. Experiments show MetaPS outperforms fixed strategies and LLM-based agents, with compact models exceeding stronger API models in performance.

arxiv arXiv cs.AI · 1d ago

Self-Evolving Cognitive Framework for Embodied Scientific Intelligence

The paper proposes a self-evolving cognitive framework that uses causal world modeling to enable embodied systems to continuously refine their internal models through interaction. It integrates causal modeling, intervention-driven reasoning, and continual refinement, redefining embodied interaction as an epistemic process for causal discovery and knowledge acquisition. The framework supports a shift from predictive to epistemic intelligence, with a new benchmark for evaluating self-evolving embodied scientific intelligence.

arxiv arXiv cs.AI · 1d ago

LLM-Orchestrated Agent for SOI Directional Coupler Design

A large language model orchestrates the design of a silicon-on-insulator 2x2 directional coupler by proposing gap values and assessing convergence. The design is validated through eigenmode and FDTD simulations on a common 2D effective-index model, showing a consistent phase offset of 2.837(11) micrometers that is corrected in a closed-loop process. The final device achieves a 50/50 split with a cross fraction of 0.498, within 0.0017 of the target.

arxiv arXiv cs.AI · 1d ago

Grounded Scaling: Determinism as a Core Limit in Agentic AI

Agentic AI performance degrades exponentially in non-deterministic environments, with k-step success falling as δ^k when per-step determinism δ < 1. The paper introduces a framework linking environment determinism to task success, verifiability, and skill evolution, proposing a Supply Certainty Index and a five-level Determinism Maturity Model. It challenges prevailing views by identifying determinism as a binding constraint across compute, data, embodiment, and alignment.

arxiv arXiv cs.LG · 1d ago

DataClaw0: Agentic Tailoring of Multimodal Data from Raw Streams

DataClaw0 introduces an agentic paradigm for actively refining raw multimodal data to align with user and downstream intents. It uses a two-stage pipeline grounded in factual anchors to generate a large-scale dataset across five domains and combines supervised fine-tuning with GRPO to achieve strong alignment with complex refinement tasks. Evaluated on video generation, VQA, and GUI navigation, DataClaw0 produces high-information-density tailored data, enabling efficient model adaptation with minimal training data.

arxiv arXiv cs.LG · 1d ago

Neural Action Codec for Vision-Language-Action Models

NAC, a neural audio codec-inspired architecture, compresses robot action trajectories as multi-channel 1D signals using multi-scale residual vector quantization. By replacing mel-spectrogram losses with time-domain and non-mel spectral reconstruction, NAC achieves high-fidelity action encoding with minimal architectural changes, outperforming existing tokenizers in reconstruction error and success rates on real-world manipulation tasks.

arxiv arXiv cs.LG · 1d ago

VLA-FAIL: Lightweight Failure Detection for Vision-Language-Action Models

VLA-FAIL introduces a lightweight, failure detection framework for vision-language-action models that uses last-layer Mahalanobis distance and action chunk consistency without requiring failure data or expensive action sampling. The framework combines these detectors to achieve reliable, early failure detection across diverse tasks, outperforming baseline methods in both accuracy and efficiency.