Researchers present a multimodal university chatbot designed to help stakeholders access timely information using retrieval-augmented generation (RAG). The system combines a large language model with semantic retrieval to generate context-based responses from institutional resources like the university handbook.
- Accepts text and image queries through a vision-language model.
- Applies quantized inference for rapid deployment on constrained hardware.
- Uses a scalable backend built with FastAPI and a responsive frontend developed with Next.js.
- Reduces hallucination from 31.7% to 6.6% compared to existing systems.
The quantitative evaluation confirms the effectiveness of retrieval grounding, while multimodal testing shows strong satisfaction scores across both query types despite increased response times for visual inputs.