A Reddit user outlines several motivations for choosing to run large language models locally rather than relying on commercial APIs.
- Users can fine-tune any model on any dataset of their choice.
- Techniques like speculative decoding can be used to maximize tokens per second.
- Running locally ensures that data is not shared with providers like Anthropic or OpenAI.
- Hardware is reusable for vision, text, and speech tasks, allowing free use of any model blend.
- Users can curate datasets without worrying about API costs.
The post highlights the benefits of control, privacy, and cost-efficiency associated with local inference.