The authors introduce Themis, an XAI-enabled testing and evaluation framework that combines transparency through explainability with alignment via human feedback for safe Reinforcement Learning systems.

  • Supports over 200 widely used environments and is easily configurable for experiments in RL, transparency, and alignment.
  • Trains reward models that match or outperform the environment's true reward signal using human preferences.
  • Provides a cloud-based platform for collecting human feedback and managing experiments that is user-friendly and auto-scalable.
  • Tests demonstrate the ability to support one thousand users in back-to-back experiments on a modest commercial machine without extra development overhead.