Estimating Grammatical Gender Directions in Contextual Embeddings under Controlled and Natural Contexts

This study addresses the conflation of grammatical gender and social semantic bias in contextual language models for gendered languages like Spanish, proposing a framework to disentangle these dimensions. The authors construct balanced datasets using controlled templates and natural Wikipedia contexts to estimate gender directions while suppressing contamination.

A framework is designed with centroid, Support Vector Machine (SVM), and Linear Discriminant Analysis (LDA) gender direction estimators alongside contamination-aware weighting strategies.
Dual-objective evaluation metrics are introduced to balance the suppression of grammatical gender leakage on inanimate nouns with the preservation of semantic gender distinctions for occupation terms.
Results indicate that unweighted controlled contexts yield the purest grammatical gender direction, and the centroid estimator outperforms discriminative baselines.

The findings provide a method for isolating grammatical gender from semantic bias in contextual embeddings, offering a pathway for more accurate gender debiasing beyond static word embeddings.