Learning Fair Pareto-Optimal Policies in Multi-Objective Reinforcement Learning
The paper introduces a framework for multi-policy multi-objective reinforcement learning that learns a set of Pareto-optimal policies ensuring fairness across diverse user preferences. It proves fair policies remain within the convex coverage set for concave welfare functions and proposes three algorithms that incorporate non-stationary and stochastic policy dynamics. Empirical results show these methods effectively learn fair policies adaptable to varying user preferences.