A user on Reddit is seeking recommendations for the largest capable reasoning model that fits within a 64 GB VRAM limit for the purpose of knowledge distillation.

  • The user has dual R9700 GPUs providing 64 GB of total VRAM.
  • They are willing to accept slower inference speeds, such as 12 tokens per second.
  • A 72 billion parameter model is identified as fitting within their hardware constraints.