Dziri Voicebot: End-to-End Speech-to-Speech System for Algerian Dialect

The paper introduces Dziri Voicebot, an end-to-end speech-to-speech conversational system designed for the low-resource Algerian Dialect. This work extends previous text-based dialogue modeling efforts by Bechiri and Lanasri to full speech-based interaction. The proposed modular pipeline integrates automatic speech recognition, natural language understanding, retrieval-augmented generation, and text-to-speech synthesis. Dedicated datasets were constructed for the telecom domain to fine-tune pretrained models for each component. The ASR system utilizes Whisper-based adaptation, while the NLU module combines transformer embeddings with a task-oriented dialogue framework. A neural TTS system was trained on a newly collected dialectal corpus to enable spoken response generation. Experimental results demonstrate strong performance across all components, including low word error rates and high intent classification scores.