Themis: An explainable AI-enabled framework for Reinforcement Learning with Human Feedback

The authors introduce Themis, an XAI-enabled testing and evaluation framework that combines transparency through explainability with alignment via human feedback for safe Reinforcement Learning systems.

Supports over 200 widely used environments and is easily configurable for experiments in RL, transparency, and alignment.
Trains reward models that match or outperform the environment's true reward signal using human preferences.
Provides a cloud-based platform for collecting human feedback and managing experiments that is user-friendly and auto-scalable.
Tests demonstrate the ability to support one thousand users in back-to-back experiments on a modest commercial machine without extra development overhead.