A Reddit user asks the community about their experiences using Q1 or Q2 quantization levels for large language models ranging from 100 to 250 billion parameters. The post lists specific models in this size range, such as DeepSeek-V4-Flash and Qwen3-235B-A22B, and contrasts them with smaller models where lower quantization is generally discouraged.

  • The author seeks feedback on the usability of these low-bit quantizations for agentic coding, writing, and chatting tasks.
  • Users are asked to report specific issues such as looping, repetition, or tool calling failures when using Q1/Q2 on large models.
  • The poster notes that while Q3 is sometimes used for medium models like MiniMax-M2 due to VRAM constraints, they suspect large models may handle lower quantization better.

The discussion aims to determine if Q1/Q2 quantization provides sufficient quality for practical use cases with very large models, potentially enabling their deployment on hardware with limited resources.