Wav2vec2 and WavLM Audio Classifier Stuck at 33% Accuracy
A user reports that fine-tuning wav2vec2-base or wavlm-base-plus for 3-class audio classification achieves only 33% accuracy, matching chance levels. The model is trained with only the classification head updated, using padded clips of 1.0s duration without attention masks, and with a learning rate of 1e-3, leading to poor performance despite class imbalance and short input clips.