A revised benchmark of local vision language models evaluates 23 models across 30 images with 3 tests each, totaling 2,070 tests and 60 to 70 inference hours. The top-performing model is Qwen3.6 27B (nothink) at Q4 with a 79.6 score, followed by Qwen3.5 4B (nothink) at Q4, and Qwen3-VL 8B at Q8. Key findings include thinking mode degrading vision performance, MoE models underperforming compared to dense models, and Q8 quantization not universally improving results.
Updated Vision Model Benchmark Results and Recommendations
from English