MORL-A2C: Multi-Objective Reinforcement Learning Reranker for Health

Researchers introduce MORL-A2C, a sequential decision-making extension to the MOPI-HFRS system that uses an Advantage Actor-Critic algorithm to optimize the trade-off between user preference and nutritional health in food recommendations.

The model formulates recommendation as a K-step reranking problem using frozen GNN embeddings and a scalarized relevance/health reward.
The policy is initialized via behavior cloning against a dot-product ranker derived from the same embeddings.
A bug in the original MOPI-HFRS evaluation pipeline was identified and corrected, updating all baseline performance metrics.
On the macro-nutrient benchmark, MORL-A2C improves the H-Score@20 from 46.05% to 69.57%, while Recall@20 drops from 25.64% to 23.61% and NDCG@20 from 23.52% to 20.64%.

These findings validate that policy-driven sequential optimization can effectively navigate the health-preference trade-off in multi-objective food recommendation systems.