This study addresses the conflation of grammatical gender and social semantic bias in contextual language models for gendered languages like Spanish, proposing a framework to disentangle these dimensions. The authors construct balanced datasets using controlled templates and natural Wikipedia contexts to estimate gender directions while suppressing contamination.

  • A framework is designed with centroid, Support Vector Machine (SVM), and Linear Discriminant Analysis (LDA) gender direction estimators alongside contamination-aware weighting strategies.
  • Dual-objective evaluation metrics are introduced to balance the suppression of grammatical gender leakage on inanimate nouns with the preservation of semantic gender distinctions for occupation terms.
  • Results indicate that unweighted controlled contexts yield the purest grammatical gender direction, and the centroid estimator outperforms discriminative baselines.

The findings provide a method for isolating grammatical gender from semantic bias in contextual embeddings, offering a pathway for more accurate gender debiasing beyond static word embeddings.