Language models of GPT-2 size cannot independently discover zero during testing, regardless of pretraining. However, performance improves significantly with training on tens to hundreds of zero examples, and language pretraining reduces required examples by about 50%.