OpenAI — korshunov.ai

Lab · OpenAI

User as Engram proposes storing per-user facts as surgical, hash-keyed edits to a memory table, leaving reasoning in a shared adapter. This design achieves 5.6x higher indirect-reasoning accuracy and maintains base-level reasoning performance, with a memory footprint 33,000x smaller than per-user LoRA. The approach enables disjoint user edits that compose losslessly, outperforming retrieval pipelines beyond 100 facts.

arxiv arXiv cs.AI · 7d ago

OneCanvas: 3D Scene Understanding via Panoramic Reprojection

OneCanvas enables 3D scene understanding in Vision-Language Models by aggregating patch features onto a panoramic canvas using 3D world coordinates. It achieves state-of-the-art results on SQA3D and VSI-Bench, with strong generalization on SPBench, using significantly less training compute than prior methods.

arxiv arXiv cs.AI · 7d ago

Data Intelligence Agents Enable Autonomous Data Querying

Data Intelligence Agents (DIA) deploy autonomous coding agents to streamline enterprise data workflows. The Query Generator matches or exceeds top published results on seven SQL benchmarks across four dialects, showing generalization through natural-language instructions and execution-based architecture.

arxiv arXiv cs.AI · 7d ago

ScenA: Reference-Driven Multi-Speaker Audio Scene Generation

ScenA conditions a text-to-audio foundation model on multiple reference voices and a natural language scene prompt to generate realistic multi-speaker conversations. It addresses the 'Reference Shortcut' issue by using a high-noise-biased training schedule, ensuring speaker assignment relies on text prompts rather than acoustic similarity. Evaluated on CoVoMix2-Dialogue, Scen- A outperforms existing systems in speaker-binding and produces rich, naturalistic audio with overlapping speech and ambient noise.

arxiv arXiv cs.AI · 7d ago

Rubric-Conditioned Self-Distillation Framework

Rubric-Conditioned Self-Distillation introduces a framework that uses structured rubrics to provide fine-grained, token-level feedback during self-distillation of reasoning language models. By conditioning teacher models on rubric-level criteria, it enables more precise credit assignment than scalar rewards, outperforming GRPO and OPSD by 1.0 and 0.9 points on average across science reasoning benchmarks.

arxiv arXiv cs.CL · 7d ago

Rubric-Conditioned Self-Distillation Framework

arxiv arXiv cs.CL · 7d ago

Turing-RL: Learning User Simulators with Turing Rewards

Turing-RL introduces a reinforcement learning method using an LLM judge to evaluate how indistinguishable generated responses are from real user inputs. It outperforms baseline methods in both LLM and human evaluations across chat and Reddit forum domains, demonstrating that optimizing for indistinguishability improves user simulator performance.

arxiv arXiv cs.CL · 7d ago

OmniAgent: Native Active Perception for Omni-Modal Understanding

OmniAgent introduces a POMDP-based iterative Observation-Thought-Action cycle for video understanding, enabling on-demand action execution to selectively distill audio-visual cues into persistent textual memory. It achieves state-of-the-art performance on ten benchmarks, with a 7B agent outperforming a 10× larger Qwen2.5-VL-72B model on LVBench (50.5% vs. 47.3%).

arxiv arXiv cs.LG · 7d ago

TAPO: Self-Distillation with Micro-Reflective Trajectories

TAPO advances self-distillation by constructing explicit micro-reflective trajectories that retain erroneous reasoning and insert natural-language diagnoses. These trajectories, derived from correct and incorrect model rollouts, provide fine-grained error corrections anchored in the model's own reasoning, improving both first-pass reasoning and error correction compared to GRPO.

arxiv arXiv cs.LG · 7d ago

REVES: Augmented Training for Test-Time Scaling

REVES introduces a two-stage iterative framework that enhances LLM reasoning through sequential revision and verification. It achieves +6.5 points over RL baselines and +4.0 points over standard multi-turn training on LiveCodeBench, using a 4B base model with fewer rollouts than large evolutionary systems. The method improves error correction and generalizes to out-of-distribution puzzles like n_queens and mini_sudoku.

arxiv arXiv cs.LG · 7d ago

Unsupervised Reward Optimization for Protein Language Models

A new framework enables protein language models to generate controllable protein sequences without labeled data or wet-lab validation. It uses task-agnostic rewards based on model uncertainty and semantic consistency to guide generation, with Soft and Binarized Reward Optimization outperforming baselines in coverage and controllability across diverse conditions.

arxiv arXiv cs.LG · 7d ago

EfficientRollout: System-Aware Self-Speculative Decoding for RL Rollouts

EfficientRollout introduces a self-speculative decoding framework that reduces rollout and end-to-end latency by up to 19.6% and 12.7% respectively, without compromising final model quality. It uses a quantized drafter derived from the target model and integrates a system-aware toggle policy to avoid compute-bound regimes, enabling effective speculation during evolving policy generations.

arxiv arXiv cs.LG · 7d ago

ViGOS: Decoupling Perception and Reasoning in Multimodal On-Policy Self-Distillation

ViGOS introduces a visually grounded on-policy self-distillation framework for multimodal large language models. It decouples perception and reasoning by using an image-only teacher for visual descriptions and a reasoning teacher for final outputs, reducing reliance on text-only references. This approach improves image-grounded performance across multiple vision-language benchmarks.

arxiv arXiv cs.CL · 7d ago

Misfired Alignment in LLMs: A Quantitative Study

A new study introduces VETO, a benchmark of 2,032 BBQ-derived contrastive pairs, to quantify misfired alignment in large language models. It defines the Misfired Alignment Rate (MAR) and finds that all benchmarked LLMs exhibit MARs between 4.7% and 18.9%, while human participants achieve 0%. The research shows alignment cues can amplify these failures, with evidence suppression occurring in late layers of models and emerging after instruction training.

arxiv arXiv cs.CL · 7d ago

Frustrated Synchronization Network Outperforms Transformers

The Frustrated Synchronization Network (FSN) achieves lower validation loss than a RoPE-SwiGLU transformer at every epoch on character-level text and code tasks. At one million parameters, FSN converges to a validation loss of 1.5953 ± 0.0014, outperforming the transformer's converged loss of 1.611. This advantage persists up to four million parameters, with ongoing evaluations beyond that scale.

arxiv arXiv cs.CL · 7d ago

Output Vector Editing Reduces Memorization in LLMs

A new method called output vector editing minimally modifies MLP neurons' output vectors to suppress memorized sequences in large language models, achieving up to 87.9% suppression in OLMo-7B. This approach outperforms zeroing neuron activations by a factor of 2.7 and works across four models from 36-7B parameters, with success rates scaling with model size and showing consistent performance across architectures.

arxiv arXiv cs.CL · 7d ago

HandwritingAgent: Language-Driven Handwriting Synthesis in SVG

HandwritingAgent synthesizes natural handwriting in SVG format without style-specific training. It uses a large reasoning model to generate stroke sequences in a grid canvas, conditioned on text input and a reference style image, enabling efficient, controllable, and generalizable handwriting generation.

arxiv arXiv cs.CL · 7d ago

REVES: Augmented Training for Test-Time Scaling

REVES introduces a two-stage iterative framework that enhances large language model reasoning through sequential revision and verification. It achieves +6.5 points over RL baselines and +4.0 points over standard multi-turn training on LiveCodeBench, using a 4B base model with fewer rollouts than larger systems. The method improves error correction and generalizes to out-of-distribution puzzles like n_queens and mini_sudoku.

arxiv arXiv cs.CL · 7d ago

Decoupling Search from Reasoning in LLM Agents

Decoupled Search Grounding (DSG) separates search functionality from reasoning models, enabling vendor-agnostic, tunable, and reusable search grounding. DSG achieves near-native accuracy on SimpleQA with 91% lower search cost and 99.4% warm-cache hit rate, while reducing latency by 68% and preserving concise output contracts.

arxiv arXiv cs.CL · 7d ago

GraphPO: Graph-based Policy Optimization for Reasoning Models

GraphPO introduces a directed acyclic graph framework to represent reasoning rollouts, merging semantically equivalent paths to reduce redundant exploration. It assigns efficiency and correctness advantages to edges, improving inference efficiency and process supervision while reducing advantage-estimation variance. Experiments show GraphPO outperforms chain- and tree-based methods on three LLMs across reasoning and agentic search tasks under identical token or response budgets.

User as Engram: Local Parametric Edits for Personal Memory

OneCanvas: 3D Scene Understanding via Panoramic Reprojection

Data Intelligence Agents Enable Autonomous Data Querying

ScenA: Reference-Driven Multi-Speaker Audio Scene Generation

Rubric-Conditioned Self-Distillation Framework

Rubric-Conditioned Self-Distillation Framework

Turing-RL: Learning User Simulators with Turing Rewards

OmniAgent: Native Active Perception for Omni-Modal Understanding

TAPO: Self-Distillation with Micro-Reflective Trajectories

REVES: Augmented Training for Test-Time Scaling

Unsupervised Reward Optimization for Protein Language Models

EfficientRollout: System-Aware Self-Speculative Decoding for RL Rollouts

ViGOS: Decoupling Perception and Reasoning in Multimodal On-Policy Self-Distillation

Misfired Alignment in LLMs: A Quantitative Study

Frustrated Synchronization Network Outperforms Transformers

Output Vector Editing Reduces Memorization in LLMs

HandwritingAgent: Language-Driven Handwriting Synthesis in SVG

REVES: Augmented Training for Test-Time Scaling

Decoupling Search from Reasoning in LLM Agents

GraphPO: Graph-based Policy Optimization for Reasoning Models