This study investigates the performance of small language models during the generation stage within a Retrieval-Augmented Generation (RAG) system. The research benchmarks these models using diverse open-source and proprietary datasets to evaluate their effectiveness across various subject areas.

  • Small language models can be executed directly on-device without requiring GPU hardware.
  • The system operates within a reasonable time frame for on-device deployment.
  • Benchmarks utilized both open-source and proprietary datasets covering diverse question types.

The findings demonstrate that RAG systems powered by small language models are viable for on-device execution, offering a practical alternative to large models that typically require significant computational resources.