A follow-up benchmark evaluates DeepSeek V4 Flash running on two RTX PRO 6000 GPUs using vLLM, comparing its performance in real-world coding tasks against API-based models like Claude Sonnet and Opus. The study finds that while Opus and Fable maintain superior code quality, DeepSeek V4 Flash achieves approximately Sonnet-level quality with significantly faster wall-clock times.

  • DeepSeek V4 Flash averages 2 minutes per task, whereas Sonnet 5 takes approximately 6 minutes, making it roughly three times slower.
  • The test utilized OpenCode for local models and Claude Code for API models to reflect typical user setups rather than isolated model performance.
  • Qwen 3.6 models were included as anchoring points for comparison within the benchmarking framework.

The results suggest that local models are becoming highly competitive in speed and quality, provided users can optimize away dense attention overheads.