A lightweight GPT-2-style Transformer enables hierarchical feature extraction from vibration signals. The framework achieves 92.61% average accuracy using only 10% labeled data, outperforming state-of-the-art methods by 17.24 percentage points in cross-domain bearing fault diagnosis.