This article introduces a continuous Latent Bridge that couples frozen reactive and reasoning vision-language models to enable real-time game agents with millisecond latency and long-horizon planning. By projecting the slow model's residuals into the fast model's input-embedding space, it avoids text round-trips while matching or beating traditional Text Bridges in performance.
- The Latent Bridge matches or beats the Text Bridge across 7 Atari games and the MetaDrive driving domain.
- It significantly improves MsPacman by 57% and RoadRunner by 28% compared to baseline reactive models.
- Combining both channels destructively interferes, reducing performance by 96% in RoadRunner.
- The bridge's benefit is highly predictable, correlating at r=0.93 with the gain of slow reasoning over fast reaction.
The approach provides a safe drop-in solution for agents requiring both rapid action and complex planning, with reproducible pipelines and replay recordings released.