Google DeepMind — korshunov.ai

Lab · Google DeepMind

A new study introduces VETO, a benchmark of 2,032 BBQ-derived contrastive pairs, to quantify misfired alignment in large language models. It defines the Misfired Alignment Rate (MAR) and finds that all benchmarked LLMs exhibit MARs between 4.7% and 18.9%, while human participants achieve 0%. The research shows alignment cues can amplify these failures, with evidence suppression occurring in late layers of models and emerging after instruction training.

arxiv arXiv cs.CL · 7d ago

Frustrated Synchronization Network Outperforms Transformers

The Frustrated Synchronization Network (FSN) achieves lower validation loss than a RoPE-SwiGLU transformer at every epoch on character-level text and code tasks. At one million parameters, FSN converges to a validation loss of 1.5953 ± 0.0014, outperforming the transformer's converged loss of 1.611. This advantage persists up to four million parameters, with ongoing evaluations beyond that scale.

arxiv arXiv cs.CL · 7d ago

Output Vector Editing Reduces Memorization in LLMs

A new method called output vector editing minimally modifies MLP neurons' output vectors to suppress memorized sequences in large language models, achieving up to 87.9% suppression in OLMo-7B. This approach outperforms zeroing neuron activations by a factor of 2.7 and works across four models from 36-7B parameters, with success rates scaling with model size and showing consistent performance across architectures.

arxiv arXiv cs.CL · 7d ago

HandwritingAgent: Language-Driven Handwriting Synthesis in SVG

HandwritingAgent synthesizes natural handwriting in SVG format without style-specific training. It uses a large reasoning model to generate stroke sequences in a grid canvas, conditioned on text input and a reference style image, enabling efficient, controllable, and generalizable handwriting generation.

arxiv arXiv cs.CL · 7d ago

REVES: Augmented Training for Test-Time Scaling

REVES introduces a two-stage iterative framework that enhances large language model reasoning through sequential revision and verification. It achieves +6.5 points over RL baselines and +4.0 points over standard multi-turn training on LiveCodeBench, using a 4B base model with fewer rollouts than larger systems. The method improves error correction and generalizes to out-of-distribution puzzles like n_queens and mini_sudoku.

arxiv arXiv cs.CL · 7d ago

Decoupling Search from Reasoning in LLM Agents

Decoupled Search Grounding (DSG) separates search functionality from reasoning models, enabling vendor-agnostic, tunable, and reusable search grounding. DSG achieves near-native accuracy on SimpleQA with 91% lower search cost and 99.4% warm-cache hit rate, while reducing latency by 68% and preserving concise output contracts.

arxiv arXiv cs.CL · 7d ago

GraphPO: Graph-based Policy Optimization for Reasoning Models

GraphPO introduces a directed acyclic graph framework to represent reasoning rollouts, merging semantically equivalent paths to reduce redundant exploration. It assigns efficiency and correctness advantages to edges, improving inference efficiency and process supervision while reducing advantage-estimation variance. Experiments show GraphPO outperforms chain- and tree-based methods on three LLMs across reasoning and agentic search tasks under identical token or response budgets.

arxiv arXiv cs.AI · 7d ago

Self-Conditioned Credit Assignment for RL with Verifiable Rewards

SC-GRPO uses per-token KL divergence from self-conditioned trajectories to weight gradients in reinforcement learning. It outperforms GRPO by 8.1% and DAPO by 5.9% across math, code, and agentic tasks, with superior out-of-distribution performance and better results than OPD.

arxiv arXiv cs.AI · 7d ago

WorldLines: Benchmarking Long-Horizon Embodied Agent Memory

WorldLines introduces a project-driven benchmark for long-horizon embodied household assistance, capturing extended household traces with dialogues, actions, and state changes. It enables evidence-linked samples for Memory QA and Embodied Task Planning, and proposes ObsMem, an observer-grounded memory framework that supports visibility-aware memories and state-aware decisions. Experiments highlight challenges in partial observability and memory translation, with ObsMem providing a stronger reference architecture for such settings.

arxiv arXiv cs.AI · 7d ago

Skill-Guided Continuation Distillation for GUI Agents

SGCD introduces an iterative framework to improve GUI agents by addressing supervision gaps in off-trajectory states. It extracts skills from both successful and failed rollouts, using them to guide policy continuations that are mixed with expert trajectories. On OSWorld-Verified, SGCD boosts success rates of three base models from low-30\% to over 50\%.

arxiv arXiv cs.AI · 7d ago

Decoupling Search from Reasoning in LLM Agents

arxiv arXiv cs.AI · 7d ago

RTSGameBench: An RTS Benchmark for Strategic Reasoning

RTSGameBench addresses limitations in existing RTS benchmarks by offering diverse gameplay, targeted competency diagnosis, and self-evolving scenario generation. It evaluates vision-language models in strategic reasoning under uncertainty, revealing that state-of-the-art models struggle with multiagent coordination and large-scale tasks.

arxiv arXiv cs.AI · 7d ago

ThinkDeception: Interpretable Multimodal Deception Detection Framework

ThinkDeception introduces a progressive reinforcement learning framework that enables interpretable multimodal deception detection. It leverages a step-by-step annotated Chain of Thought dataset and proposes Visual-Audio Consistency Group Relative Policy Optimization with a dynamic curriculum, enhancing reasoning quality and outperforming existing methods on mainstream benchmarks.

arxiv arXiv cs.AI · 7d ago

TRAP: Benchmark for Task-completion and Resistance to Active Privacy-extraction

TRAP evaluates how well models complete tasks using private data without leaking it. Across 22 models, all show non-trivial privacy leakage, with instruction-following ability linked to higher leakage. Structural private field isolation prevents leakage by replacing private fields with hash keys, maintaining task accuracy without sacrificing privacy.

arxiv arXiv cs.AI · 7d ago

RODS: Reward-Driven Online Data Synthesis for Multi-Turn Tool-Use Agents

RODS addresses sample depletion in multi-turn tool-use RL by using reward variance to detect capability boundaries. It synthesizes new data in real time, matching structural complexity of boundary samples, and maintains a dynamic replay buffer that co-evolves with the policy. RODS achieves performance comparable to a 17K-sample offline pipeline with 20x fewer trajectories.

arxiv arXiv cs.AI · 7d ago

AdsMind: Physics-Grounded Multi-Agent System for Adsorption Discovery

AdsMind is a closed-loop multi-agent system that uses machine learning force fields and feedback to correct errors in adsorption configuration searches on catalyst surfaces. It achieves 100% and 98.8% success rates on AA20 and OCD-GMAE62 benchmarks, reduces energy dispersion by 14-fold compared to baselines, and maintains correct adsorption-energy signs in DFT validation, outperforming open-loop LLM agents.

arxiv arXiv cs.LG · 8d ago

NoiseTilt: Noise-Tilted Reverse Kernels for Diffusion Reward Alignment

NoiseTilt introduces NTRK, a reward-guided diffusion sampler that injects reward gradients via the noise term without altering the reverse kernel. By using a whitening operator, NTRK safely biases noise toward high reward, preserving sample quality while maintaining strong guidance. On aesthetic generation, NTRK achieves superior reward performance with 25 NFEs, reducing compute by 20× compared to state-of-the-art baselines.

arxiv arXiv cs.LG · 8d ago

Compositional Generalization in Language Model Reasoning

A hierarchical latent selection model shows that supervised fine-tuning and reinforcement learning work together to enable compositional generalization in language models. SFT provides raw module materials, while RL identifies and recombines atomic modules from compound traces to solve new problems. Training on compound traces leads to stronger generalization than isolated module training, and an effective protocol is found where SFT ensures module coverage and RL drives exploration of novel compositions.

arxiv arXiv cs.CL · 8d ago

ProvenanceGuard: Source-Aware Factuality Verification for MCP-Based LLM Agents

ProvenanceGuard introduces a source-aware verifier for MCP-based LLM agents that detects cross-source conflation by routing claims to specific evidence sources and comparing stated attribution with actual source ownership. It achieves block F1 of 0.802 and source accuracy of 0.858 on 260 source-eligible claims, outperforming source-blind baselines, and detects all injected attribution swaps in 50 clinical probes.

arxiv arXiv cs.CL · 8d ago

d-OPSD: On-policy Self-distillation for Diffusion LLMs

d-OPSD is the first on-policy self-distillation framework designed for diffusion LLMs. It uses self-generated answers as suffix conditioning and step-level supervision, enabling efficient post-training with only about 10% of RLVR's optimization steps while outperforming RLVR and SFT baselines on four reasoning benchmarks.

Misfired Alignment in LLMs: A Quantitative Study

Frustrated Synchronization Network Outperforms Transformers

Output Vector Editing Reduces Memorization in LLMs

HandwritingAgent: Language-Driven Handwriting Synthesis in SVG

REVES: Augmented Training for Test-Time Scaling

Decoupling Search from Reasoning in LLM Agents

GraphPO: Graph-based Policy Optimization for Reasoning Models

Self-Conditioned Credit Assignment for RL with Verifiable Rewards

WorldLines: Benchmarking Long-Horizon Embodied Agent Memory

Skill-Guided Continuation Distillation for GUI Agents

Decoupling Search from Reasoning in LLM Agents

RTSGameBench: An RTS Benchmark for Strategic Reasoning

ThinkDeception: Interpretable Multimodal Deception Detection Framework

TRAP: Benchmark for Task-completion and Resistance to Active Privacy-extraction

RODS: Reward-Driven Online Data Synthesis for Multi-Turn Tool-Use Agents

AdsMind: Physics-Grounded Multi-Agent System for Adsorption Discovery

NoiseTilt: Noise-Tilted Reverse Kernels for Diffusion Reward Alignment

Compositional Generalization in Language Model Reasoning

ProvenanceGuard: Source-Aware Factuality Verification for MCP-Based LLM Agents

d-OPSD: On-policy Self-distillation for Diffusion LLMs