NotKshitiz has released LitmusLab, a command-line tool designed to automate the comparison of multiple large language model quantization formats. The tool addresses the common challenge of manually testing various quantization options by providing a side-by-side evaluation framework.

  • Supports FP16, INT8, NF4, FP4, HQQ, Quanto INT8/INT4, AWQ, GPTQ, and FP8 formats.
  • Integrates with HuggingFace Transformers and vLLM backends.
  • Includes adaptive VRAM budgeting to prevent out-of-memory errors on smaller GPUs.
  • Features per-mode failure handling so one broken configuration does not halt the entire run.
  • Offers an optional AI-generated deployment recommendation via Groq or fully offline deterministic mode.

The tool aims to streamline the selection of quantization formats by automating the benchmarking process for hardware VRAM, speed, and quality tradeoffs.