Stochastic momentum methods like HB and ASGD show distinct batch-size tradeoffs in compute efficiency and serial runtime. HB maintains SGD-level compute efficiency over a batch-size window up to a factor \sqrt{\kappa} larger than SGD's critical batch size, while ASGD improves small-batch efficiency for rapidly decaying spectra but sacrifices it for larger batches in exchange for reduced serial runtime.