Hugging Face has implemented a recent change that blocks multi-threaded download acceleration, resulting in 403 errors for all but one thread per connection. This update significantly affects the GGUF ecosystem, where large single-file models are common and single-thread speeds are often capped at 40 MB/s. Previously, tools like the Hugging Face CLI accelerated downloads by fetching multiple smaller files in parallel, a method now hindered by this restriction. The author notes that downloading an entire branch of GGUF repositories is inefficient due to the presence of large files and multiple quantizations within the same branch. Without a reversal of this policy, download speeds will remain slow unless uploaders transition to splitting models into numerous smaller files across different branches. Such a workaround would require users to manually merge files, which is considered less desirable than Hugging Face restoring previous acceleration capabilities.
Hugging Face Blocks Multi-Threaded Downloads, Impacting GGUF Ecosystem
from English