REAR: Test-time Preference Realignment through Reward Decomposition
The authors introduce REAR, a novel framework that extends test-time scaling (TTS) to preference alignment by modeling the task as a realignment problem. This approach addresses the limitation of existing TTS methods, which are typically restricted to verifiable domains like mathematics and coding.