This study applies sparse autoencoders to MolFormer to mechanistically examine how molecular representations are built across layers, challenging the assumption that chemical language models only learn surface-level syntax.
- Early layers rely on position-tracking latents to parse molecular grammar.
- Later layers encode atom-in-substructure and pharmacologically relevant features.
- Non-canonical SMILES produce more disruptive representation shifts than invalid SMILES due to position-latent disruption.
- The authors developed InterMol, an interactive visualizer for SAE activations on molecular strings and structures.
The findings reveal that chemical language models encode meaningful semantic features beyond syntax, with the new tool supporting further exploration of these internal representations.