A user on r/LocalLLaMA is looking for recommendations for large language models that can utilize their hardware's full capacity of 144GB VRAM and 120GB RAM. The poster currently uses Qwen3.6 27B and Gemma4 31B but wants a more powerful option for complex reasoning, coding, and tool calling.

  • Current setup includes Minimax M2.7 at Q6 quantization, which requires 207GB base memory plus KV cache and context space.
  • The user is debating between moving to Minimax M3 at Q3 quantization or finding other "chonky" models.
  • The goal is to maximize intelligence for tasks that can take a long time to answer, prioritizing accuracy over speed.

The post invites community comparisons, specifically asking if M3@Q3 is equivalent to M2.7@Q6, to help decide on the best model for their specific hardware constraints.