This paper addresses the performance degradation of offensive comment detection models when deployed across different Chinese social media platforms by proposing a dual-threshold hard example mining method.
- A binary baseline is established by fine-tuning clean-Chinese-base RoBERTa on the COLD dataset.
- A three-class fine-labeled test set covering Weibo, Xiaohongshu, Tieba, and Zhihu is constructed to quantify domain distances using Jaccard and Proxy-A Distance.
- High- and low-confidence error-prone samples are filtered from unlabeled corpora based on prediction confidence.
- The model undergoes secondary fine-tuning with a small set of manually labeled hard examples under implicit contexts for low-cost cross-platform adaptation.
The optimized model achieves significant performance gains across the four tested platforms, demonstrating effective domain adaptation.