Largest model under 64GB VRAM for distillation

A user on Reddit is seeking recommendations for the largest capable reasoning model that fits within a 64 GB VRAM limit for the purpose of knowledge distillation.

The user has dual R9700 GPUs providing 64 GB of total VRAM.
They are willing to accept slower inference speeds, such as 12 tokens per second.
A 72 billion parameter model is identified as fitting within their hardware constraints.