A quality cliff at 6.25 Hz in neural audio codecs is caused by insufficient training token exposure due to fixed clip duration. Correcting this training configuration enables smooth WER degradation down to 3.1 Hz and 1.6 Hz, indicating low frame rate efficiency is more achievable than previously thought.
Low Frame Rate Degradation in Neural Audio Codecs
from English