AI agents
arxiv arXiv cs.CL · 2d ago

Micro-Transaction Markets for Verified Product Info in Agentic E-Commerce

Autonomous agents in e-commerce face a scarcity of trustworthy product information, not product matching. A proposed micro-transaction model allows agents to pay fractions of a cent to access verified data like service histories and test reports, with pricing and trust scored via reputation. This system prioritizes genuine product quality and real-time information acquisition over chatbot fluency.

arxiv arXiv cs.CL · 2d ago

Metis: Bridging Text and Code Memory for Self-Evolving Agents

Metis introduces a hierarchical dual-representation memory that combines text and code memory to improve self-evolving agents. It organizes experience into execution plans, facts, and pitfalls, crystallizing reusable plans into validated tools only when justified. Evaluated on AppWorld, Metis achieves up to 20.6% higher task accuracy and 22.8% lower execution cost than ReAct, with better overall balance across accuracy, efficiency, and memory cost.

arxiv arXiv cs.CL · 2d ago

Agon: Autonomous Research System via Prompt Economy

Agon is an autonomous research system that uses prompt economy to validate checkable claims in workflows, leaving judgment to human scientists. It operates across 444 iterations with minimal prompts and no human-written code, revealing a taxonomy of failures by severity, fixability, visibility, and capability locus. The system demonstrates scalability and advances research toward a paradigm where machines handle scale and humans guide judgment.

arxiv arXiv cs.CL · 2d ago

Dialogue to Discovery: Attribute-Aware Preference Elicitation

Dialogue to Discovery (D2D) is an attribute-oriented framework that improves conversational product search by dynamically guiding user interactions. It adapts query priorities and recommendation timing, achieving 22.2-29.9% higher target-finding accuracy, 6.6-16.1% lower abandonment, and 27.5% shorter conversations compared to existing methods, with user studies confirming improved satisfaction and efficiency.

arxiv arXiv cs.CL · 2d ago

EDV Framework Enables Reliable Experience Learning for Agentic Systems

The EDV framework introduces an Execute-Distill-Verify paradigm to overcome the self-confirmation trap in large language model agents. By using multiple agents to explore tasks, a third-party agent to distill experiences, and a consensus-based verification step, EDV ensures only accurate experiences are stored in memory. Evaluation on tau2-bench, Mind2Web, and MMTB shows EDV outperforms strong baselines, demonstrating its effectiveness in enabling robust agent self-evolution.

arxiv arXiv cs.CL · 2d ago

MEMPROBE: Benchmark for Long-Term Memory Recovery in Agents

MEMPROBE is a benchmark that evaluates long-term memory in AI agents by reconstructing a user's hidden state from the agent's memory after interaction. It tests 5 memory systems across 50 simulated users with 31 dimensions each, finding that task completion is high even for memoryless agents, while memory recovery remains moderate and drops under top-k retrieval. MEMPROBE enables direct, auditable assessment of memory retention and proposes recovery as a key objective for future agent development.

arxiv arXiv cs.LG · 2d ago

Distilling Transformers into Recurrent Transformers for Efficient Memory

A new distillation method transfers the observation compression strategy of full-history transformers to recurrent models. By training a teacher model to compress observation histories into fixed-size bottlenecks, the approach aligns the student's memory with the teacher's compression. This enables recurrent transformers to achieve near-full-history performance with linear-time complexity, making them viable for long-horizon robotics applications.