arXiv cs.CL — korshunov.ai

Source · arXiv cs.CL

A study reveals that even neutral prompts trigger region-specific responses in large language models due to user metadata. Location leakage increases by up to 793 times in some models, and using 'Unknown' instead of location metadata still causes significant bias, indicating the user profile frame itself acts as a conditioning signal.

arxiv arXiv cs.CL · 8d ago

Agentic Benchmark Reveals AI Models Fail to Avoid Animal Exploitation

TAC, the first agentic benchmark for implicit animal welfare, tests AI agents' ability to avoid animal exploitation in travel booking scenarios. All seven frontier models score below 64%, with the best at 53%, and even minor prompt improvements yield only modest gains. An audit finds no signs of evaluation awareness, indicating performance gaps stem from lack of true welfare reasoning, not prompt recognition.

arxiv arXiv cs.CL · 8d ago

Red-Team Study Finds Frontier LLMs Remain Vulnerable to Automated Attacks

A red-team study of Anthropic's Fable 5 and Opus 4.8 models reveals both are vulnerable to adaptive iterative attacks, with Opus 4.8 breached on 11.5% of intents and Fable 5 on 6.1%. Despite robust defenses, both models generated 1,620 and 702 panel-confirmed harmful completions across all harm categories, automatically and efficiently under automated attack.

arxiv arXiv cs.CL · 8d ago

d-OPSD: On-policy Self-distillation for Diffusion LLMs

d-OPSD is the first on-policy self-distillation framework designed for diffusion LLMs. It uses self-generated answers as suffix conditioning and step-level supervision, enabling efficient post-training with only about 10% of RLVR's optimization steps while outperforming RLVR and SFT baselines on four reasoning benchmarks.

arxiv arXiv cs.CL · 8d ago

RubricsTree: Scalable Evaluation Framework for Personal Health Agents

RubricsTree introduces a hierarchical taxonomy of over 100 clinically-verifiable Boolean rubrics, evolved from 4,000 real user queries via human-in-the-loop curation. It enables scalable, expert-aligned evaluation of personal health agents by dynamically routing queries to relevant rubrics and outperforms baseline methods in alignment, context sensitivity, and model performance gains of up to 66% on HealthBench.

arxiv arXiv cs.CL · 8d ago

ZPPO: Teacher in Prompts, Not Gradients

Zone of Proximal Policy Optimization (ZPPO) integrates teacher knowledge directly into prompts rather than policy gradients. It uses Binary and Negative Candidate-included Questions to surface student failure modes and amplifies learning through a prompt replay buffer, achieving superior performance on hard questions across student scales, especially at smaller model sizes.

arxiv arXiv cs.CL · 8d ago

Visuals Lie, Consistency Speaks: Disentangling Spatial Attention from Reliability in Vision-Language Models

A study challenges the assumption that visual attention signals reliability in vision-language models. It finds near-zero correlation between spatial attention and accuracy, showing instead that self-consistency across reasoning paths is a stronger predictor of truth. Reliability is better explained by generation dynamics and internal state distributions, not visual attention patterns.

arxiv arXiv cs.CL · 8d ago

NarrativeWorldBench and N-VSSM for Long-Horizon Audio Drama

NarrativeWorldBench evaluates 21 LLMs on nine narrative-structure metrics across horizons of 10 to 200 episodes, with cross-lingual support in Hindi, Tamil, Telugu, and Marathi. N-VSSM, a latent world model using Mamba-2, achieves plot-beat F1 of at least 0.84 across all horizons with 4x lower compute than closed-frontier models and outperforms Claude Opus 4.5 in long-arc consistency and controllability in a professional writer study.

arxiv arXiv cs.CL · 8d ago

PARSE: Real-Document Defense for LLM Agents

PARSE reduces prompt injection attack success from 25.4% to 15.6% on real enterprise documents across five professional domains, with statistically significant improvement (p=0.014) and 86.9% utility. It outperforms paraphrasing and uses provenance-aware sanitization to preserve factual content while routing most documents through a lightweight path.

arxiv arXiv cs.CL · 8d ago

STATEWITNESS: Activation Explainer for Deception Auditing in LLMs

STATEWITNESS introduces an activation explainer that audits deception in reasoning LLMs by reading hidden states and generating natural-language answers or structured reports. It achieves a 0.916 mean AUROC, outperforming existing black-box monitors and activation probes by 11.6% and 25.0% respectively, and provides query-level, schema, and evidence-level traces for human inspection.

arxiv arXiv cs.CL · 8d ago

Routing Accuracy Degradation and Recovery in Enterprise Agent Systems

As enterprise agent tool catalogs scale from 10 to 110 agents, routing accuracy drops 16--23 percentage points on under-specified requests. An oracle analysis identifies retrieval and confusion gaps, with embedding-based shortlisting recovering +10--11pp F1. A human-annotated study of 1,435 utterances confirms real-world recovery of +10--17pp despite lower absolute performance.

arxiv arXiv cs.CL · 8d ago

LLM Features Can Hurt GNNs via Concatenation Interference

Concatenating LLM-generated features to graph neural networks systematically reduces accuracy on homophilous benchmarks, with PubMed accuracy dropping by -17.0 ± 0.3 pp. This degradation is linked to LLM-alone discriminability (Delta_sig), which correlates strongly with concatenation cost (r² = 0.38) and shows a power law relationship with feature dimension and node count (r² = 0.97), particularly in low-Delta_sig, low-node scenarios.

arxiv arXiv cs.CL · 8d ago

OPD-Evolver: On-Policy Distillation for Holistic Agent Evolving

OPD-Evolver introduces a slow-fast co-evolution framework that enables agents to select, act on, and reuse experience through on-policy self-distillation. It outperforms existing memory and training-based methods by up to 11.5% and 5.8% respectively, and demonstrates capability to challenge large-scale models like Qwen3.5-397B-A17B and Step-3.5-Flash.

arxiv arXiv cs.CL · 8d ago

SkillMigrator Enables Cross-Site Web Skill Transfer via Layout Matching

SkillMigrator learns reusable web skills by matching layout structures instead of specific element references. It stores each skill as a transferable interaction pattern (TIP) with a structural sketch, enabling efficient skill reuse across sites. Compared to state-of-the-art methods, it reduces average LLM-action counts by 8-10% on WebArena and Mind2Web at matched success rates.

arxiv arXiv cs.CL · 8d ago

MambaCount: Efficient Text-guided Object Counting

MambaCount introduces a spatial sparse state space duality block to enable efficient text-guided open-vocabulary object counting. It addresses causal modeling limitations and high entropy in spatial token responses, achieving state-of-the-art results on FSC-147 with a test MAE of 12.23 while maintaining linear complexity.

arxiv arXiv cs.CL · 8d ago

EnvRL: Leveraging Environment Dynamics in Agentic RL

EnvRL introduces a framework that enhances agentic reinforcement learning by incorporating environment dynamics through state prediction and inverse dynamics objectives. It achieves significant gains in success rates on long-horizon benchmarks, improving Qwen-2.5-1.5B-Instruct performance from 72.8% to 77.4% on ALFWorld and from 56.8% to 67.0% on WebShop when trained with GRPO.

arxiv arXiv cs.CL · 8d ago

LLM-Designed Training Environment for RL with Multi-Agent Reasoning

The LLM-as-Environment-Engineer framework uses LLMs to automatically redesign training environments in reinforcement learning by analyzing failure trajectories and contextual data. On the MAPF-FrozenLake testbed, it outperforms larger proprietary LLMs and fixed-environment baselines, with Qwen3-4B achieving the strongest aggregate performance. Analysis shows that failure evidence and preserved working configurations are key, and the current RL checkpoint performs better than the base model as an environment engineer.

arxiv arXiv cs.CL · 8d ago

SuCo: Sufficiency-guided Continuous Adaptive Reasoning

SuCo introduces Minimal Sufficient CoT (MSC) as the shortest reasoning prefix adequate for correct answers. It employs a two-stage training framework—MSC-Aligned Fine-Tuning and Sufficiency-Aware Policy Optimization—to reduce reasoning length while maintaining or improving accuracy across math, code, and science tasks.

arxiv arXiv cs.CL · 8d ago

Vision-language models don't always need images for chest X-ray accuracy

A causal audit shows that text-only models match multimodal models in chest radiography accuracy. Across nine systems, a text-only model performs within 5.7 points of the best multimodal model, and a 119-billion-parameter model is indistinguishable from a 7-billion-parameter text-only baseline. Grounding audits, not accuracy, should determine clinical deployment.

arxiv arXiv cs.CL · 8d ago

Automated Prompt Optimization for LLM Game Agents

A new framework automates prompt refinement for LLM agents by splitting the observation-to-action pipeline into goal-conditioned and action selection modules. It uses an LLM-driven evolutionary loop to iteratively improve prompts based on environment feedback, achieving up to 72.5% success on PutNext where prior agents failed, without model fine-tuning.

Geographic Bias in Large Language Models from User Metadata

Agentic Benchmark Reveals AI Models Fail to Avoid Animal Exploitation

Red-Team Study Finds Frontier LLMs Remain Vulnerable to Automated Attacks

d-OPSD: On-policy Self-distillation for Diffusion LLMs

RubricsTree: Scalable Evaluation Framework for Personal Health Agents

ZPPO: Teacher in Prompts, Not Gradients

Visuals Lie, Consistency Speaks: Disentangling Spatial Attention from Reliability in Vision-Language Models

NarrativeWorldBench and N-VSSM for Long-Horizon Audio Drama

PARSE: Real-Document Defense for LLM Agents

STATEWITNESS: Activation Explainer for Deception Auditing in LLMs

Routing Accuracy Degradation and Recovery in Enterprise Agent Systems

LLM Features Can Hurt GNNs via Concatenation Interference

OPD-Evolver: On-Policy Distillation for Holistic Agent Evolving

SkillMigrator Enables Cross-Site Web Skill Transfer via Layout Matching

MambaCount: Efficient Text-guided Object Counting

EnvRL: Leveraging Environment Dynamics in Agentic RL

LLM-Designed Training Environment for RL with Multi-Agent Reasoning

SuCo: Sufficiency-guided Continuous Adaptive Reasoning

Vision-language models don't always need images for chest X-ray accuracy

Automated Prompt Optimization for LLM Game Agents