The llama.cpp b9840 release introduces conversion support for the DeepSeek V4 model, including specific handling for the Pro variant. This update integrates the new architecture into the library alongside various internal optimizations and bug fixes.
- Added dsv4 conversion, llm_graph_input_dsv4, and save-load state functionality.
- Enabled Flash Attention (FA) with necessary padding and graph reuse mechanisms.
- Supported multi-sequence processing and partial checkpointing capabilities.
- Released binaries for macOS, Linux, Android, Windows, and openEuler across CPU, GPU, and specialized accelerators like ROCm, SYCL, and OpenVINO.
This release allows users to run DeepSeek V4 models locally using llama.cpp on a wide variety of hardware configurations.