AI agents — korshunov.ai

AI agents Page 1 / 21

Sovereign Execution Broker for Certificate-Bound Agentic Control

The Sovereign Execution Broker (SEB) introduces a runtime enforcement boundary that verifies and executes certified authority in agentic systems. It validates execution contracts, checks validity periods, and ensures policy compliance before invoking infrastructure APIs, providing a short-lived, auditable, and revocable execution capability. The prototype was evaluated on AWS and Kubernetes, measuring latency, revocation propagation, and fault injection resistance.

arxiv arXiv cs.AI · 7d ago

LedgerAgent: Structured State for Policy-Adherent Tool-Calling Agents

LedgerAgent introduces a structured ledger to maintain task states separately in tool-calling agents. It renders states into prompts and enforces policy constraints before tool execution, reducing policy violations and improving performance across customer-service domains.

github CrewAI · 7d ago

v1.14.8a2 Release Notes

v1.14.8a2 adds a single agent action to Flow definitions and validates CEL expressions at load time. It includes a new Datadog integration guide with an importable operations dashboard and updated snapshot and changelog for v1.14.8a1.

arxiv arXiv cs.LG · 7d ago

Style Diversity Outperforms Topic Diversity in Annotation-Free Synthetic Data

A new framework generates synthetic dialogue without human-annotated data, using only intent definitions. It incorporates topic and style attributes, with post-hoc stylization models Univ and Exam, and an LLM-as-a-judge filtering process. Results show up to 93.3% of human-annotated data performance, confirming that style diversity is more critical than topic diversity for data utility.

arxiv arXiv cs.LG · 7d ago

Agentic Symbolic Search for PDE Solution Characterization

ASYS proposes a prior-guided framework that uses mathematical theory and evolutionary search to generate interpretable symbolic forms of PDE solutions. It produces analytical representations for complex problems like Allen-Cahn dynamics and Keller-Segel blow-up, offering new pathways for mathematical analysis beyond traditional methods.

arxiv arXiv cs.LG · 7d ago

UltraQuant: 4-bit KV Caching for Context-Heavy Agents

UltraQuant introduces a 4-bit KV caching method tailored for context-heavy agent workloads. It achieves 3.47x reduction in P50 time-to-first-token in late rounds and 1.63x higher output throughput compared to FP8 KV caching, using FP8 queries, FP4 KV tensors, and native AMD CDNA4 scaled-MFMA support.

arxiv arXiv cs.LG · 7d ago

Marginal Advantage Accumulation for Memory-Driven Agent Self-Evolution

This paper introduces Marginal Advantage Accumulation (MAA), a post-processing architecture that addresses cross-batch inconsistency in memory-driven agent self-evolution. MAA formalizes alignment and comparability as structural conditions, uses differential signals and exponential moving average to accumulate signed evidence per operation, and ensures traceability via semantic identity merging. It outperforms batch-level baselines in 14 out of 16 settings and reduces token consumption by about 75%.

arxiv arXiv cs.LG · 7d ago

Evaluator Bias Propagation in Multi-Agent LLM Systems

Contagion Networks introduces a framework to measure how evaluator biases spread among LLM agents. In a 3-agent experiment, biases propagate with coefficients between 0.157 and 0.352, and homogeneous-model agents show significantly weaker contagion than cross-model setups. Increasing evaluator committee size from k=1 to k=3 reduces effective contagion by 72.4%.

arxiv arXiv cs.LG · 7d ago

Probe-and-Refine Tuning Improves Coding Agent Performance

A new method called probe-and-refine tuning uses synthetic bug-fix probes to iteratively improve repository guidance files with single-shot LLM calls, without agent loops or tool use. On SWE-bench Verified, it achieves a 33.0% mean resolve rate—14.5 percentage points higher than the initial static knowledge base—showing improved coverage rather than patch precision. The method enables agents to use larger step budgets effectively, and performance remains stable across models when diagnostic output is sufficient.

arxiv arXiv cs.LG · 7d ago

Sovereign Execution Broker for Certificate-Bound Agentic Control

The Sovereign Execution Broker (SEB) introduces a runtime enforcement boundary that verifies and executes certified authority in agentic systems. It ensures production mutation authority is isolated from non-deterministic reasoning by validating execution contracts, validity windows, and revocation states before invoking infrastructure APIs. The prototype demonstrates secure, auditable execution on AWS and Kubernetes with measurable latency and fault resilience.

arxiv arXiv cs.LG · 7d ago

Execution-State Capsules for Low-Latency On-Device AI Serving

Execution-state capsules enable graph-bound checkpointing and restoration of complete execution state, including KV, recurrent, and convolution states, for low-latency, small-batch on-device AI serving. On RTX 5090 and Jetson AGX Thor, capsule restore achieves byte-exact and token-identical correctness, with sub-millisecond GPU operations and TTFT speedups up to 27x at 16k tokens, demonstrating significant latency reduction in interactive AI workflows.

arxiv arXiv cs.CL · 7d ago

H-RePlan: Hierarchical Recovery for Cross-Device Agent Systems

H-RePlan introduces a hierarchical replanning framework that separates device-local strategy recovery from global orchestrator replanning. It outperforms existing baselines by achieving higher completion and instruction adherence, with reduced token cost, through scope-aware recovery in multi-device agent systems.

arxiv arXiv cs.CL · 7d ago

LedgerAgent: Structured State for Policy-Adherent Tool-Calling Agents

LedgerAgent introduces a structured ledger to maintain task states separately in tool-calling agents. It renders these states into prompts and enforces policy constraints before tool execution, reducing policy violations and improving performance across customer-service domains.

arxiv arXiv cs.AI · 7d ago

AI Economist Agent: Model-Grounded Economic Analysis Framework

The AI Economist Agent uses RAG, knowledge graphs, and LLMs to generate economic narratives grounded in theory and data. It enables model-based analysis, evidence retrieval, and report generation, ensuring economic coherence and traceability through explicit model computations.

arxiv arXiv cs.AI · 7d ago

See-and-Reach: Vision-Language Navigation for UAVs in Field of View

UAV-VLN-FOV isolates the see-and-reach stage for precise evaluation of UAV navigation. 3DG-VLN enhances visual grounding and spatial alignment using dynamic 3D direction cues, achieving a 13.82% success rate improvement over baselines and validated in real-world trials.

arxiv arXiv cs.AI · 7d ago

Task Manager Reduces Queue Latency by 14-75% at Enterprise Scale

A Task Manager introduces priority inference, related-event merging, and preemption to enable continuous operation in enterprise AI. It reduces high-priority queue latency by 14-77% and improves related-event correctness by over 20 percentage points at enterprise scale, addressing agent discovery noise as the primary bottleneck.

arxiv arXiv cs.AI · 7d ago

Attention-Based SAC for Porosity Prediction in Additive Manufacturing

A multi-head attention feature extractor integrated with Soft Actor-Critic improves porosity prediction and process parameter optimization in laser powder bed fusion. The method achieves a convergence value of 322.79 in 14 episodes, outperforming DQN, PPO, TD3, and vanilla SAC with faster convergence and greater stability.

arxiv arXiv cs.AI · 7d ago

Sensorimotor World Models for Action-Aligned Perception

A new sensorimotor world model (SMWM) learns compact, action-relevant latent representations from offline trajectories. It uses inverse dynamics regularization to prevent representation collapse and align latent states with controllable environmental degrees of freedom, enabling stable training without complex regularizers or frozen components. SMWM achieves competitive planning performance in 2D and 3D control tasks.

arxiv arXiv cs.AI · 7d ago

Dual-Agent Framework for Cross-Model Verified Translation

A dual-agent framework converts natural-language experiment protocols into executable commands for robotic lab platforms. It uses a Parser Agent and a rule-based mapping engine to translate protocols, with a heterogeneous LLM Validation Agent ensuring accuracy and triggering self-correction. The framework successfully enables end-to-end autonomous execution of microplate-based experiments like the Bradford assay.

arxiv arXiv cs.AI · 7d ago

ScaffoldAgent: Utility-Guided Dynamic Outline Optimization

ScaffoldAgent introduces a utility-guided framework for dynamic outline optimization in open-ended deep research. It models outline evolution through Expansion, Contraction, and Revision operations, guided by a feedback mechanism that evaluates retrieval gain, structural coherence, and generation quality. Experiments show it improves long-form report generation and factual grounding compared to existing agents.