HilDA introduces a self-supervised pretraining framework for LiDAR backbones that uses hierarchical distillation and temporal occupancy diffusion to improve semantic and geometric understanding. It achieves state-of-the-art results on cross-modal distillation benchmarks and outperforms prior methods in 3D object detection, scene flow, and semantic occupancy prediction.