Researchers introduce DigenRL, a disaggregated reinforcement learning framework designed to address the inefficiencies of colocated execution in diffusion-based generative large language models. The system supports flexible resource allocation and heterogeneous GPUs while utilizing novel parallelism techniques to reduce execution bubbles.

  • Introduces generation-axis pipeline (GAP) and time-step parallelism (TSP) for finer-grained pipelining between rollout and training.
  • Proposes elastic trainer-assisted generation (TAG) to allow trainer GPU resources to dynamically assist in executing rollout generations.
  • Implements a tightly one-step constrained asynchronous strategy to utilize the tail bubble in the pipeline.
  • Achieves 1.56-2.10x throughput improvements over state-of-the-art systems like veRL-Omni and GenRL across multiple hardware testbeds.

DigenRL enables efficient task scheduling and independent scaling for diffusion generative models, significantly improving performance compared to existing implementations that couple rollout and training resources.