A study evaluates whether cluster-based semantic chunking improves retrieval and answer quality in Retrieval-Augmented Generation (RAG) systems compared to fixed-size and recursive chunking strategies. The evaluation focuses on long, structured academic theses using the RAGAs framework.

  • Cluster-based chunking did not outperform simpler strategies under the tested configuration.
  • Performance on fixed versus document-specific questions varied substantially, likely related to document formatting and preprocessing.
  • RAGAs-based faithfulness showed limited reliability in this setup.

The findings suggest that more complex chunking methods may not provide advantages over simpler approaches for this specific use case.