All articles
arxiv arXiv cs.CL · 3h ago

OLIVE: View-Augmented Latent Prediction with Waveform Reconstruction for Speech SSL

The authors propose OLIVE, a self-supervised speech representation learning framework that jointly optimizes analysis and synthesis objectives through view-augmented masked latent prediction and waveform reconstruction. This unified approach constrains early encoder features to retain signal-level information while shaping later contextual representations toward invariance for robust downstream performance.