ObviousBench: A Benchmark for Visible LLM Failures in Smaller Models
ObviousBench is a new benchmark designed to evaluate visible failures in large language models, focusing on how configuration choices impact error rates. The tool highlights the trade-offs between model size, speed, and reasoning capabilities rather than just ranking performance.