AI agents — korshunov.ai

AI agents Page 1 / 20

Learning Red Agent Policy from Observations for Neurosymbolic Cyber Agents

A policy learning technique using imitation learning is proposed to predict red agent actions in partially observable cyber environments. The method learns red agent policies from network observations and defender actions, enabling neurosymbolic cyber-defense agents to accurately predict attacks and adapt defenses in diverse simulated scenarios.

arxiv arXiv cs.AI · 9d ago

EvolveNav: Self-Evolving Memory for Zero-Shot Navigation

EvolveNav introduces a self-evolving framework for zero-shot object-goal navigation that improves during test time. It uses a rule memory derived from past trajectories and a confidence-based retrieval strategy to select effective actions, reducing redundant exploration. The method achieves a 10.1% higher success rate than existing baselines with fewer unnecessary steps.

arxiv arXiv cs.AI · 9d ago

ReproRepo: Scaling Reproducibility Audits with GitHub Issues

ReproRepo introduces a scalable framework using GitHub issues to evaluate ML paper reproducibility. It shows that LLM agents like Codex with GPT-5.5 identify at least one blocker in 90% of paper-repository pairs without executing code, though exact localization remains challenging.

arxiv arXiv cs.AI · 9d ago

Visual Verification Enables Inference-time Steering and Autonomous Policy Improvement

VERITAS introduces a generator-verifier framework that enables robots to improve policies in real time without additional training. A visual verifier evaluates actions at inference time, allowing consistent performance gains through verified rollouts that serve as effective supervision for offline policy improvement. Post-training with these verified rollouts matches expert demonstrations in efficiency, without human intervention.

Learning Red Agent Policy from Observations for Neurosymbolic Cyber Agents

EvolveNav: Self-Evolving Memory for Zero-Shot Navigation

ReproRepo: Scaling Reproducibility Audits with GitHub Issues

Visual Verification Enables Inference-time Steering and Autonomous Policy Improvement

Negative Token Filtering for Stable Single-Rollout RL

Implicit vs. Explicit Prompting in LVLMs for Referential Communication

NarrativeWorldBench and N-VSSM for Long-Horizon Audio Drama

LLM Recommendation Bias and Brand Competition Dynamics

MODE-RAG: Evaluating and Reducing Hallucinations in M-RAG

PARSE: Real-Document Defense for LLM Agents

AIPatient Arena: EHR-grounded evaluation of LLMs in clinical workflows

Routing Accuracy Degradation and Recovery in Enterprise Agent Systems

LLMs Outperform Humans in Next Speaker Prediction

OPD-Evolver: On-Policy Distillation for Holistic Agent Evolving

SkillMigrator Enables Cross-Site Web Skill Transfer via Layout Matching

EnvRL: Leveraging Environment Dynamics in Agentic RL

LLM-Designed Training Environment for RL with Multi-Agent Reasoning

EComAgentBench: Benchmarking Shopping Agents with Hidden Intent

AI-Driven Avatars Enable Realistic ACT Psychotherapy Training

Coding Benchmarks Misaligned with Agentic Software Engineering