arXiv cs.CL — korshunov.ai

Source · arXiv cs.CL

HydraHead introduces a head-level hybridization of Full and Linear Attention, leveraging interpretability to select retrieval-critical heads and fuse outputs via a scale-normalized module. Trained on 15B tokens, it achieves over 69% improvement over baseline at 512K context length, outperforming layer-wise hybrids and approaching Qwen3.5's performance on long-context tasks.

arxiv arXiv cs.CL · 6d ago

Causal Activation Directions for Mitigating Emergent Misalignment in Language Models

Fine-tuning language models on insecure code causes emergent misalignment. A shared activation direction across four model families achieves 99.6% separation of aligned and misaligned activations, and subtracting it reduces code spillover by 21-51 points. Cross-architecture transfer shows behavioral suppression but lacks specificity, with within-model directions being causally actionable and cross-model directions only causally real.

arxiv arXiv cs.CL · 7d ago

STARE: Surprisal-Guided Token-Level Advantage Reweighting for Policy Entropy Stability

STARE addresses policy entropy collapse in GRPO-based reinforcement learning by identifying entropy-critical token subsets via surprisal quantiles and reweighting their advantages. It maintains stable policy entropy across model scales and tasks, outperforming DAPO and other baselines by 4%-8% on AIME24 and AIME25, with consistent exploration-exploitation balance.

arxiv arXiv cs.CL · 7d ago

Rubric-Conditioned Self-Distillation Framework

Rubric-Conditioned Self-Distillation introduces a framework that uses structured rubrics to provide fine-grained, token-level feedback during self-distillation of reasoning language models. By conditioning teacher models on rubric-level criteria, it enables more precise credit assignment than scalar rewards, outperforming GRPO and OPSD by 1.0 and 0.9 points on average across science reasoning benchmarks.

arxiv arXiv cs.CL · 7d ago

Turing-RL: Learning User Simulators with Turing Rewards

Turing-RL introduces a reinforcement learning method using an LLM judge to evaluate how indistinguishable generated responses are from real user inputs. It outperforms baseline methods in both LLM and human evaluations across chat and Reddit forum domains, demonstrating that optimizing for indistinguishability improves user simulator performance.

arxiv arXiv cs.CL · 7d ago

OmniAgent: Native Active Perception for Omni-Modal Understanding

OmniAgent introduces a POMDP-based iterative Observation-Thought-Action cycle for video understanding, enabling on-demand action execution to selectively distill audio-visual cues into persistent textual memory. It achieves state-of-the-art performance on ten benchmarks, with a 7B agent outperforming a 10× larger Qwen2.5-VL-72B model on LVBench (50.5% vs. 47.3%).

arxiv arXiv cs.CL · 7d ago

PragReST: Self-Reinforcing Counterfactual Reasoning for Pragmatic Language Understanding

PragReST is a self-supervised framework that enhances large language models' pragmatic reasoning by generating counterfactual reasoning traces and training via supervised fine-tuning and reinforcement learning. It outperforms baseline models on four pragmatic benchmarks, improving Qwen3-8B and Qwen3-14B by 5.37% and 5-5.50% accuracy respectively, and maintains strong performance on general-knowledge and mathematical reasoning tasks.

arxiv arXiv cs.CL · 7d ago

Misfired Alignment in LLMs: A Quantitative Study

A new study introduces VETO, a benchmark of 2,032 BBQ-derived contrastive pairs, to quantify misfired alignment in large language models. It defines the Misfired Alignment Rate (MAR) and finds that all benchmarked LLMs exhibit MARs between 4.7% and 18.9%, while human participants achieve 0%. The research shows alignment cues can amplify these failures, with evidence suppression occurring in late layers of models and emerging after instruction training.

arxiv arXiv cs.CL · 7d ago

Frustrated Synchronization Network Outperforms Transformers

The Frustrated Synchronization Network (FSN) achieves lower validation loss than a RoPE-SwiGLU transformer at every epoch on character-level text and code tasks. At one million parameters, FSN converges to a validation loss of 1.5953 ± 0.0014, outperforming the transformer's converged loss of 1.611. This advantage persists up to four million parameters, with ongoing evaluations beyond that scale.

arxiv arXiv cs.CL · 7d ago

Output Vector Editing Reduces Memorization in LLMs

A new method called output vector editing minimally modifies MLP neurons' output vectors to suppress memorized sequences in large language models, achieving up to 87.9% suppression in OLMo-7B. This approach outperforms zeroing neuron activations by a factor of 2.7 and works across four models from 36-7B parameters, with success rates scaling with model size and showing consistent performance across architectures.

arxiv arXiv cs.CL · 7d ago

HandwritingAgent: Language-Driven Handwriting Synthesis in SVG

HandwritingAgent synthesizes natural handwriting in SVG format without style-specific training. It uses a large reasoning model to generate stroke sequences in a grid canvas, conditioned on text input and a reference style image, enabling efficient, controllable, and generalizable handwriting generation.

arxiv arXiv cs.CL · 7d ago

Data Recipe Boosts Long-Context Reasoning in LLMs

A data-centric approach improves long-context reasoning in large language models, using eight curated datasets with 14K examples across retrieval, multi-evidence synthesis, and reasoning tasks. When paired with minimal outcome-based GRPO training, it achieves average gains of +7.2 to +6.4 points on seven benchmarks, outperforming prior RL training sets, and enhances agentic performance by +4.8 and +7.0 points on GAIA and BrowseComp respectively.

arxiv arXiv cs.CL · 7d ago

REVES: Augmented Training for Test-Time Scaling

REVES introduces a two-stage iterative framework that enhances large language model reasoning through sequential revision and verification. It achieves +6.5 points over RL baselines and +4.0 points over standard multi-turn training on LiveCodeBench, using a 4B base model with fewer rollouts than larger systems. The method improves error correction and generalizes to out-of-distribution puzzles like n_queens and mini_sudoku.

arxiv arXiv cs.CL · 7d ago

SenFlow: Advanced AI-Generated Text Detection in Hybrid Documents

SenFlow introduces a novel method for detecting AI-generated text in hybrid documents by modeling inter-sentence dependencies. It achieves state-of-the-art performance on MOSAIC, a benchmark of 16,000 documents from PubMed and XSum, with a +4.15 pp Macro-F1 gain on cross-domain transfer. SenFlow reveals that AI-generated content still exhibits generator-dependent sentence-length patterns, exploitable by sentence-level detectors despite perplexity filtering.

arxiv arXiv cs.CL · 7d ago

Decoupling Search from Reasoning in LLM Agents

Decoupled Search Grounding (DSG) separates search functionality from reasoning models, enabling vendor-agnostic, tunable, and reusable search grounding. DSG achieves near-native accuracy on SimpleQA with 91% lower search cost and 99.4% warm-cache hit rate, while reducing latency by 68% and preserving concise output contracts.

arxiv arXiv cs.CL · 7d ago

GraphPO: Graph-based Policy Optimization for Reasoning Models

GraphPO introduces a directed acyclic graph framework to represent reasoning rollouts, merging semantically equivalent paths to reduce redundant exploration. It assigns efficiency and correctness advantages to edges, improving inference efficiency and process supervision while reducing advantage-estimation variance. Experiments show GraphPO outperforms chain- and tree-based methods on three LLMs across reasoning and agentic search tasks under identical token or response budgets.

arxiv arXiv cs.CL · 8d ago

LegalHalluLens: Auditing Hallucinations in Legal AI

LegalHalluLens introduces a framework to audit AI hallucinations in legal contexts by analyzing typed hallucination profiles across four claim categories. It reveals a 38-40 point gap between obligation/numeric and temporal claims, and shows two systems with identical 52% hallucination rates can have opposite risk directions. The framework uses a Risk Direction Index and calibrated debate pipelines to reduce fabricated detections by 45% and improve accountability in legal AI deployment.

arxiv arXiv cs.CL · 8d ago

ProvenanceGuard: Source-Aware Factuality Verification for MCP-Based LLM Agents

ProvenanceGuard introduces a source-aware verifier for MCP-based LLM agents that detects cross-source conflation by routing claims to specific evidence sources and comparing stated attribution with actual source ownership. It achieves block F1 of 0.802 and source accuracy of 0.858 on 260 source-eligible claims, outperforming source-blind baselines, and detects all injected attribution swaps in 50 clinical probes.

arxiv arXiv cs.CL · 8d ago

SkillWeaver: Compositional Skill Routing for LLM Agents

SkillWeaver introduces a decompose-retrieve-compose framework for LLM agents, formalizing the Compositional Skill Routing problem. It achieves 67.7% decomposition accuracy via Iterative Skill-Aware Decomposition (SAD), improving from 51.0% with a p-value of less than 10^-6, and reduces context window usage by over 99%.

arxiv arXiv cs.CL · 8d ago

Handlebars Triple-Brace Injection Exploits Structural Role Delimiters

Handlebars' triple-brace interpolation fails to protect against structural role injection, as HTML escaping only neutralizes angle-bracket delimiters. It leaves colon and Markdown hash delimiters intact, enabling attackers to hijack model turns. The default escaping provides no protection for most role delimiter families and cannot replace a structural separation of instructions and data.

HydraHead: Head-Level Hybrid Attention for Long-Context Performance

Causal Activation Directions for Mitigating Emergent Misalignment in Language Models

STARE: Surprisal-Guided Token-Level Advantage Reweighting for Policy Entropy Stability

Rubric-Conditioned Self-Distillation Framework

Turing-RL: Learning User Simulators with Turing Rewards

OmniAgent: Native Active Perception for Omni-Modal Understanding

PragReST: Self-Reinforcing Counterfactual Reasoning for Pragmatic Language Understanding

Misfired Alignment in LLMs: A Quantitative Study

Frustrated Synchronization Network Outperforms Transformers

Output Vector Editing Reduces Memorization in LLMs

HandwritingAgent: Language-Driven Handwriting Synthesis in SVG

Data Recipe Boosts Long-Context Reasoning in LLMs

REVES: Augmented Training for Test-Time Scaling

SenFlow: Advanced AI-Generated Text Detection in Hybrid Documents

Decoupling Search from Reasoning in LLM Agents

GraphPO: Graph-based Policy Optimization for Reasoning Models

LegalHalluLens: Auditing Hallucinations in Legal AI

ProvenanceGuard: Source-Aware Factuality Verification for MCP-Based LLM Agents

SkillWeaver: Compositional Skill Routing for LLM Agents

Handlebars Triple-Brace Injection Exploits Structural Role Delimiters