Research paper
arxiv arXiv cs.AI · 8h ago

SQLConductor: Search-to-Policy Learning for Step-wise Text-to-SQL Orchestration

The authors propose SQLConductor, a step-wise orchestration learning framework for Text-to-SQL that addresses the limitations of fixed pipelines and static plan-then-execute methods. This system formulates subtasks as specialized actions and trains a policy model to select the next action based on intermediate artifacts and feedback. To learn this policy, the framework introduces Search-to-Policy Learning, which utilizes Monte Carlo Tree Search to explore candidate workflows and stability estimation to identify robust supervision. The policy model is trained using Stability-weighted Supervised Fine-tuning to prioritize high-quality orchestration patterns and further enhanced through Curriculum Reinforcement Learning. This approach transforms offline workflow search into a deployable policy for step-wise orchestration at inference time. Experiments on BIRD-Dev and out-of-distribution datasets show that SQLConductor achieves 73.2% execution accuracy, outperforming prior methods with comparable or larger backbones. The results demonstrate superior execution accuracy and strong generalization while coordinating frozen larger action models.

lab Hugging Face Blog · 8h ago

Analysis of Token Prediction Accuracy in Hybrid Language Models

A recent study investigates which specific tokens are predicted more accurately by hybrid language models compared to standard dense architectures. The research focuses on understanding the distribution of prediction errors across different token types, such as rare words and code snippets. By analyzing the loss landscapes, the authors identify that hybrid models excel at capturing long-range dependencies in sparse data regions. The findings suggest that the mixture of experts mechanism allows for more efficient parameter utilization during inference. This improved accuracy is particularly notable for tokens with low frequency in the training corpus. The paper provides a detailed breakdown of performance metrics across various benchmark datasets. These results highlight the potential of hybrid architectures for handling diverse linguistic structures effectively.

arxiv arXiv cs.AI · 8h ago

Self-Aware Scheduling Learns Token Unmasking Order in Diffusion Language Models

The authors propose Self-Aware Scheduling (SAS) to optimize the token unmasking order in masked diffusion language models, which significantly impacts generation quality. They derive a tractable upper bound on sequential decoding mismatch using Kullback-Leibler divergence and pathwise log-likelihood. This bound creates a dense self-aware reward that frames order selection as a policy optimization problem with a frozen denoiser. SAS learns a lightweight order policy via Group Relative Policy Optimization, supporting both any-order and semi-autoregressive decoding. On Sudoku tasks using a 1B parameter model, accuracy improved from 82.0% to 91.8%, reaching 97.5% after second-stage fine-tuning. For mathematical reasoning with LLaDA-8B, pass@1 on GSM8K increased from 64% to 76%. The method also raised MBPP scores from 39.5% to 41%, consistently matching or exceeding heuristic schedules across various parameters.

arxiv arXiv cs.AI · 8h ago

KORE: Kolmogorov-Optimal Scaling Laws for Spline Regression

Researchers propose KORE, a method that solves for the optimal spline resolution in closed form rather than relying on hyperparameter search. The approach leverages classical approximation theory to pin squared bias to the Kolmogorov n-width and uses the PRESS identity for leave-one-out error estimation. By balancing these known curves, the algorithm analytically determines the minimizer without exhaustive grid sweeps. KORE extends this calculus to high dimensions by replacing ambient input dimension with interaction order in an ANOVA decomposition. The algorithm fits two pilot resolutions and solves a leverage-calibrated system to evaluate the plug-in resolution with minimal compute. Across additive and sparse pairwise targets up to 80 dimensions, KORE matches exhaustive cross-validation accuracy while fitting roughly eight times fewer models. On 36 real tabular datasets, it ranked first among 21 methods in accuracy per unit of compute.

arxiv arXiv cs.AI · 9h ago

Enactor: A Generative Model for Closed-Loop Microsimulation of Signalized Intersections

The authors introduce Enactor, an actor-centric generative model designed for closed-loop microsimulation at signalized intersections. Unlike traditional simulators that rely on hand-crafted rules or short-horizon predictors, Enactor focuses on vehicle dynamics while treating pedestrians as contextual influences. The architecture encodes dynamic actors and lane polylines in polar coordinates relative to the intersection center. A transformer with separate spatial and temporal attention blocks predicts a distribution over each actor's next-step motion parameters. Training employs a closed-loop curriculum, exposing the model to its own predictions to ensure stability during simulation. Evaluations on two intersection geometries show Enactor recovers SUMO data generator distributions with significantly lower KL divergence than transformer baselines. The model also reduces red-light violations by more than an order of magnitude and outperforms constant-velocity baselines on real-world field data.

arxiv arXiv cs.AI · 9h ago

Persistent Homology Detects and Steers LLM Responses to Ill-Posed Questions

Researchers propose using finite zero-dimensional persistent homology to represent the topology of ill-posed questions within large language models. The method models contextual hidden states as point clouds, summarizing each transformer layer with three descriptors: mean finite lifetime, normalized lifetime entropy, and largest-lifetime concentration. These descriptors are concatenated across layers to form a unified topological representation of the query's internal state. The study introduces topology-conditioned activation steering, which retrieves similar examples to construct interventions that encourage clarification or abstention. Evaluations on AmbigQA, SituatedQA, and CLAMBER show this approach outperforms prompt-based baselines, improving classification accuracy from 67.4% to 78.9% on AmbigQA. On SituatedQA, accuracy increased from 79.9% to 88.5%, while CLAMBER saw gains from 57.6% to 69.6%. Additionally, the steering mechanism raised the average total acceptable response rate from 61.4% to 70.6% across three open-weight LLMs.

arxiv arXiv cs.AI · 9h ago

SPIRAL: Learning to Search and Aggregate

The authors introduce Sequential-Parallel-Aggregative Reinforcement Learning (SPIRAL), a framework that trains language models to utilize sequential, parallel, and aggregative reasoning primitives simultaneously. Unlike standard post-training methods that optimize only for single-trace sequential reasoning, SPIRAL unifies these components into a single inference compute pipeline. The model first samples independent traces in parallel using chain-of-thought reasoning and then generates a final aggregation trace conditioned on those inputs. This entire process is optimized end-to-end against the reward of the final aggregated response using set reinforcement learning and standard reinforcement learning techniques. Experiments on reasoning tasks demonstrate that SPIRAL effectively scales with inference compute resources. The approach outperforms GRPO by up to 11 times in scaling efficiency and achieves 15% higher performance when all three compute primitives are scaled.

arxiv arXiv cs.AI · 9h ago

Polycepta: Object-Centric Appearance Estimation for Multi-Object Tracking

The authors introduce Polycepta, an object-centric appearance state estimation framework that reformulates appearance modeling as a recursive estimation problem. Unlike traditional methods relying on static, frame-independent descriptors, Polycepta constructs and continuously updates independent appearance states for each tracked object. This approach allows future representations to be estimated from accumulated observations rather than memorizing them through a specific learning strategy. A key feature is that appearance estimation quality improves progressively as object states evolve during inference. The framework enables appearance estimation for unseen classes by encouraging the learning of object-specific representation construction. Extensive experiments on KITTI, Waymo Open Dataset, and MOT17 demonstrate consistent reductions in identity switches and improved tracking performance. When integrated into the RobMOT framework, Polycepta operates at 90.57 Hz and achieves a MOTA of 92.27% on the KITTI benchmark.

arxiv arXiv cs.AI · 9h ago

Dual-Learned Matching Enables Linear Mode Connectivity for Billion-Parameter Transformers

Researchers propose a scalable framework to enable linear mode connectivity-based merging for billion-parameter pretrained transformers. Existing methods typically optimize interpolation paths from only one model endpoint, limiting scalability for large architectures. The new approach applies parameterized weight transformations to align functionally equivalent solutions and uses a dual learning procedure where both models jointly learn transformations toward a shared path. This bidirectional optimization substantially reduces interpolation barriers and improves merging reliability across large-scale models. Empirically, the method achieves near-zero loss barriers on WikiText for medium-sized language models. In vision tasks, ViT-L maintains above 69% ImageNet top-1 accuracy throughout the interpolation path. Modern billion-parameter LLMs exhibit only small loss barriers using this technique.

arxiv arXiv cs.AI · 9h ago

Neural Classification Trees Disentangle Latent Subgroups for Robust ML

Machine learning models often exploit spurious correlations, leading to high average accuracy but poor performance on underrepresented subgroups. Existing mitigation strategies typically adjust network parameters using subgroup annotations or inferred pseudo-labels. However, these methods generally output only a class prediction at inference time, lacking insight into a sample's latent subgroup structure. To address this, the authors propose Neural Classification Trees (NCT), a framework that encodes subgroup structure within its tree-shaped architecture. NCT routes each sample to an easy or hard node based on prediction correctness and reuses these routes as pseudo-labels for subsequent iterations. This process disentangles conflicting subgroups without requiring explicit subgroup supervision. The approach was evaluated on five benchmarks spanning binary and multi-class spurious correlations. Experiments demonstrate that the learned tree topology isolates minority subgroups, providing strong interpretability and competitive robustness compared to state-of-the-art methods.

arxiv arXiv cs.AI · 11h ago

Self-Filtering: Iterative Data Selection for Vision-Language Models

The authors propose a novel bootstrapped method called Self-Filtering to address noise in large-scale vision-language datasets without relying on manual oversight or curated references. This approach trains a CLIP model on an evolving dataset that balances filtered, high-probability clean samples with diverse examples from the entire distribution. The process iterates between training the model and selecting an improved data mixture for subsequent steps. By continuously refining the dataset through this cycle, the method mitigates the need for additional external data sources. The study demonstrates that training on these self-selected datasets improves downstream performance effectively. This technique operates independently of pre-trained models or heuristic-based filtering strategies.

arxiv arXiv cs.AI · 11h ago

DiT-Reward: Using Diffusion Transformer Representations for Text-to-Image Reward Modeling

The authors introduce DiT-Reward, a method that converts a pretrained text-to-image Diffusion Transformer into a reward model by aggregating text-conditioned image representations across transformer layers. Evaluated under the same training data mixture as HPSv3, DiT-Reward outperforms HPSv3 on all four preference benchmarks, achieving 85.6% on HPDv2 and 77.6% on HPDv3. The study reveals that downstream reward performance is strongest in middle-to-late layers and benefits from combining representations across different stages. Even with a frozen generative backbone, a lightweight learned head can extract meaningful preference predictions from these representations. When used to optimize Stable Diffusion 3.5 Large with Flow-GRPO, DiT-Reward surpasses HPSv3 along the matched training trajectory, showing clear gains in realism. Additionally, direct latent scoring provides a 1.65x inference speedup over HPSv3 while maintaining comparable peak memory usage. These results demonstrate that pretrained generative Diffusion Transformers provide transferable representations for reward modeling and policy optimization.

arxiv arXiv cs.AI · 11h ago

Learning Process Rewards via Success Visitation Matching for Efficient RL

The authors address the challenge of training reinforcement learning policies with inherently sparse outcome rewards, which leads to difficult credit assignment problems. They propose a method to transform these sparse rewards into dense process rewards by training a discriminator to distinguish between successful and unsuccessful episodes. This discriminator incentivizes the policy to match the state-action visitations of successful episodes while avoiding those of unsuccessful ones. By providing dense feedback on progress toward task completion, the approach provably achieves this without altering the optimal policy. The method is specifically applied to the finetuning of robotic control policies for manipulation tasks. Experimental results demonstrate significantly faster RL finetuning performance in both simulated and real-world environments compared to maximizing sparse outcome rewards alone.

arxiv arXiv cs.AI · 12h ago

Tapered Language Models: Improving Performance via Depth-Aware Capacity Allocation

Modern language models typically allocate parameters uniformly across identical layers, despite evidence that later layers primarily refine the residual stream rather than transform it. To address this asymmetry, researchers investigated whether parameter capacity should vary by depth under a fixed budget. Controlled experiments demonstrated that allocating more capacity to earlier layers and less to later layers improves perplexity compared to uniform baselines, while the reverse allocation degrades performance. Building on these results, the authors introduce Tapered Language Models (TLMs), an architectural principle where parameter-bearing components are monotonically tapered across depth. MLPs serve as the primary site for this instantiation due to their dominance in parameter count and clear width axis. The study tested tapering via a smooth cosine schedule across three model scales and four architectures, including Transformer, Gated Attention, Hope-attention, and Titans. Results show that TLMs consistently improve perplexity and downstream benchmark performance over uniform baselines without additional compute costs. These findings establish depth-aware capacity allocation as a simple, architecture-agnostic design lever for language models.

arxiv arXiv cs.AI · 12h ago

NVIDIA Nemotron Challenge: String Matching and Backtracking for Bit Manipulation Puzzles

This paper details algorithmic innovations developed for the NVIDIA Nemotron Model Reasoning Challenge, specifically targeting bit manipulation puzzles where models must deduce hidden logical rules. To address the combinatorial explosion of bitwise operations and LLM hallucinations, the authors abandon arithmetic logic in favor of string similarity and structured search. The core contribution reframes logic-gate deduction as a base-selection task using minimal bit flips to isolate primitive transformations. A backtracking depth-first search process is formalized to test candidates, detect logical collisions, and perform robust error recovery. Additionally, the method employs bit tokenization and interactive reasoning supervised fine-tuning with dynamic masking to simulate oracle feedback. Evaluated on these puzzles, the approach achieved over 96% validation accuracy. This performance secured the highest result in the category and a seventh-place finish in the overall contest.

arxiv arXiv cs.AI · 12h ago

PsyBridge: A Hybrid Framework for Multi-Dimensional Mental Health Assessment

The study introduces PsyBridge, a hybrid intelligent framework designed to address the limitations of isolated screening instruments in mental health assessment. This system integrates clinically validated tools like PHQ-9 and GAD-7 with cognitive evaluation and personality profiling within a unified architecture. A modular design employing a weighted aggregation mechanism generates interpretable risk classifications and recommendations for users. To evaluate performance, researchers constructed a semi-synthetic dataset comprising 500 patient profiles based on clinically grounded score distributions. Experimental results show that PsyBridge achieves an overall accuracy of 0.84, outperforming standalone PHQ-9 and GAD-7 assessments. The framework also demonstrates improvements in precision, recall, and F1-score compared to existing methods. Sensitivity analysis confirms that integrating cognitive and personality components stabilizes classification performance and reduces prediction inconsistencies. These findings suggest PsyBridge offers a scalable approach for AI-assisted decision support in digital healthcare environments.

arxiv arXiv cs.AI · 12h ago

Open Problem: Is AdamW Effective Under Heavy-Tailed Noise?

AdamW serves as the standard optimizer for training large language models, yet its theoretical foundation remains largely confined to finite-variance regimes. This gap is significant because empirical evidence suggests that stochastic gradient noise during LLM pretraining typically exhibits heavy-tailed characteristics. Recent studies have demonstrated that sign-based optimizers like Lion and Muon achieve sharp convergence rates under heavy-tailed conditions, while AdaGrad also converges in this setting. However, rigorous convergence theory for AdamW has not yet been established within these heavy-tailed assumptions. The authors pose an open problem regarding whether AdamW can converge under the same heavy-tailed assumptions or if its second-moment accumulator creates a genuine obstruction. To address this, they formulate a positive weighted-metric benchmark and provide a corridor lower-bound mechanism. This mechanism illustrates how denominator memory in AdamW can effectively hide large gradients, potentially impacting its performance.

arxiv arXiv cs.AI · 12h ago

AIR: Adaptive Interleaved Reasoning with Code in MLLMs

This paper introduces AIR, a method that empowers multimodal large language models with adaptive interleaved reasoning capabilities through extended reinforcement learning training on code-augmented complex numerical computation tasks. The authors address the limitation of existing literature, which primarily focuses on tool-use within vision-perception tasks and relies on predefined heuristics incapable of handling numerical computations. To solve this, they propose a comprehensive three-component solution including a two-stage cold-start data construction pipeline, data filtering strategies for reinforcement learning dataset curation, and an adaptive tool-invocation strategy leveraging a group-constrained reward function. Extensive experiments demonstrate that after reinforcement learning training with this reward function, performance improves by an average of 6.1 percentage points on evaluation benchmarks. Specifically, the accuracy for interleaved reasoning samples increases by 9.9 percentage points, while the overall success rate of tool-use exceeds 95 percent. The researchers provide their data and code for public access at a specified GitHub repository.

arxiv arXiv cs.AI · 12h ago

Semantic Browsing: Controllable Diversity for Image Generation

Modern text-to-image models often suffer from diversity collapse despite high fidelity. The authors introduce Semantic Browsing to enable controlled diversity through structured image galleries. This method allows users to navigate meaningful axes of variation rather than incidental noise. The approach exploits the decoupling of semantic decision-making and pixel generation in recent models. Diversity is induced directly at the text level using rich textual representations. A Vision Language Model operates on full scene context within an agentic workflow. This workflow explicitly enforces structured variation attuned to the original prompt. The result is a navigable design space with interpretable semantic decisions.

arxiv arXiv cs.AI · 12h ago

CoorDex: Coordinating Body and Hand Priors for Continuous Dexterous Humanoid Loco-Manipulation

The authors introduce CoorDex, a learning pipeline that enables high-degree-of-freedom dexterous loco-manipulation on moving humanoids. This approach converts high-dimensional body and hand control into coordinated latent residual control, overcoming the limitations of traditional stop-and-go methods. The system trains privileged motion tracking teachers from simulated demonstrations and distills them into proprioception-conditioned latent priors. These frozen priors serve as the action space for downstream residual reinforcement learning via a policy that composes task context with separate body-hand residual heads. CoorDex allows a Unitree G1 humanoid equipped with a 20-DoF WUJI hand to perform complex tasks while in motion, such as non-stop bottle grasping and fridge door opening. Ablation studies demonstrate that joint-space PPO and monolithic latent prediction fail under similar reward budgets, whereas the proposed latent-prior interface ensures trainability for contact-rich manipulation.