R2D-RL bridges RCSS2D and HELIOS-based clients with a Python MARL interface using shared-memory and cycle-level synchronization. It enables full-field and scenario-based training with configurable opponents, action masks, EPV-based reward shaping, and parallel execution, including front-goal scenarios and an 11-vs-11 benchmark with baseline results.