USAF enables fine-tuning MoE models on GPUs that only support inference

The author introduces USAF, a new sparse fine-tuning method for Mixture of Experts (MoE) models designed to allow fine-tuning on hardware capable only of inference.

The method trains sparse expert weights and the router instead of using adapters.
It allows fine-tuning of Qwen3-30B-A3B on an AMD RX 6750 XT with 12 GB of VRAM.
The project is open source under the Apache 2.0 license.

This approach aims to democratize access to MoE model customization by removing the high hardware requirements typically associated with fine-tuning.