Proposal for crowd-sourced, open-source distilled LLMs via distributed training

A Reddit user proposes a system to create truly open-source distilled large language models by wrapping existing command-line AI services. This approach would collect user inputs and outputs from applications like coding assistants or chatbots to build massive datasets through volunteer participation.

The proposal suggests distributing the model training phase across the GPUs of gamers, allowing for slower but scalable computation. It acknowledges that establishing a trusted central authority for coordination and data release is the primary challenge, though starting with smaller models could help build trust over time.

The author notes that while the concept requires significant infrastructure and coordination, it offers a potential pathway for community-driven model development if sufficient volunteers can be mobilized.