A study evaluates whether cluster-based semantic chunking improves retrieval and answer quality in Retrieval-Augmented Generation (RAG) systems compared to fixed-size and recursive chunking strategies. The evaluation focuses on long, structured academic theses using the RAGAs framework.
- Cluster-based chunking did not outperform simpler strategies under the tested configuration.
- Performance on fixed versus document-specific questions varied substantially, likely related to document formatting and preprocessing.
- RAGAs-based faithfulness showed limited reliability in this setup.
The findings suggest that more complex chunking methods may not provide advantages over simpler approaches for this specific use case.