The author argues that acquiring new hardware should be used for supervised fine-tuning (SFT) and reinforcement fine-tuning (RFT) rather than standard model benchmarking. This approach offers a viable path to monetization by leveraging open models, especially as proprietary APIs become less accessible or more expensive.
- Post-training requires balancing quality and speed, with data mixing and synthesis being critical for performance.
- Model characteristics significantly impact training; Qwen models are difficult to fine-tune due to saturated knowledge, while Llama models absorb new information more easily.
- Reinforcement fine-tuning involves a complex mix of inference rollouts and weight updates using methods like PPO or GRPO.
- Engineering skills are essential for building low-power, massively parallel stacks that allow for rapid iteration cycles.
Custom post-training is presented as one of the few remaining opportunities in the open model space, offering potential income despite being competitive and hardware-dependent.