A Reddit user questions the practical utility of Retrieval-Augmented Generation (RAG) for personal projects involving coding, sysadmin work, and small codebases. The author argues that standard industry knowledge is already well-covered by models, while specific data sources like codebases or API references are either too small to require indexing or too large to manage efficiently.
- RFC libraries are considered verbose and unnecessary.
- Industry standards are typically better handled directly by the model than by cherry-picked documents.
- Personal codebases are often too small to fit context windows and change too frequently for effective indexing.
- Managing entire API references for large languages like C# or Node.js is viewed as excessive overhead.
- Historical context is deemed relevant only for enterprise applications with massive scale, not smaller projects.
The post seeks community advice on what content is actually useful to include in RAG systems and how to manage long-term maintenance for large datasets like full API documentation.