Researchers propose AIGP, a framework using Large Language Models to address interpretability and long-term objective misalignment in e-commerce dynamic pricing. The system employs supervised fine-tuning and a Long-Term Value Estimator trained via offline reinforcement learning to align pricing decisions with business goals.

  • Utilizes LLMs prompted with domain knowledge, structured data, and textual context for interpretable pricing.
  • Implements a Long-Term Value Estimator (LTVE) trained via offline reinforcement learning as a reward model.
  • Applies Direct Preference Optimization (DPO) to align the pricing policy with long-term business objectives.
  • Achieved +13.21% GMV, +7.59% ROI, and +8.20% milestone achievement rate in 14-day A/B tests on Tao Factory.

The framework enables interpretable and transparent pricing rationales while significantly improving key performance metrics such as Gross Merchandise Value and Return on Investment.