A Reddit user asks whether a solid leaderboard exists that compares closed-source and open-weight large language models side by side. They note that most available benchmarks feel fragmented and fail to address the practical differences between running models locally versus using API-based services.
- The user seeks a clear comparison between local open-weight models and competitive API-only models.
- They inquire if any open models match the performance of GLM-5.2 or Qwen3.6 27B within their size constraints.
- The user observes that models in the 70B–350B parameter range often require massive VRAM increases without delivering proportional real-world quality improvements.
The post highlights a community need for better evaluation metrics to determine which models are actually worth running locally given hardware limitations.