This paper addresses the performance degradation of offensive comment detection models when deployed across different Chinese social media platforms by proposing a dual-threshold hard example mining method.

  • A binary baseline is established by fine-tuning clean-Chinese-base RoBERTa on the COLD dataset.
  • A three-class fine-labeled test set covering Weibo, Xiaohongshu, Tieba, and Zhihu is constructed to quantify domain distances using Jaccard and Proxy-A Distance.
  • High- and low-confidence error-prone samples are filtered from unlabeled corpora based on prediction confidence.
  • The model undergoes secondary fine-tuning with a small set of manually labeled hard examples under implicit contexts for low-cost cross-platform adaptation.

The optimized model achieves significant performance gains across the four tested platforms, demonstrating effective domain adaptation.