Agentic AI for Bilevel Long-Term Optimization of Policy-Driven Physical Layer Systems

This paper introduces Agentic-LTPO, a nested bilevel optimization framework designed to address the limitations of fixed-objective methods in physical layer systems facing dynamic operator policies and real-time constraints. The framework utilizes agentic AI to generate upper-level configurations that translate evolving policies and historical experiences into structured lower-level problems for immediate decision-making.

Agentic-LTPO employs a nested bilevel structure where an upper level generates configurations based on policy changes and environment summaries, while a lower level solves these for real-time physical-layer decisions.
The study uses cell-free MIMO beamforming as a use case, implementing a multi-agent decision process with retrieval-augmented experience-based verification in the upper level and a closed-form beamformer in the lower level.
Experiments show that Agentic-LTPO effectively enhances long-term system performance by 57.2% compared to traditional methods while maintaining strong adaptability to dynamic operator policies.

This approach allows physical layer systems to remain effective and adaptive despite changing service requirements and stringent real-time constraints, offering a significant improvement in long-term performance over static optimization techniques.