DEVELOPMENT OF SMART VOICE AGENT With case study (Libyan Voice Assistant)
محتوى المقالة الرئيسي
الملخص
The paper presents the creation of an end-to-end voice assistant system designed for a lesser-resourced dialect of Arabic, Libyan Tripolitanian, which does not receive local support in commercial ASR and NLP applications. To remediate this lack, we built a demographically balanced and phonemically rich corpus of speech data containing over 13,000 audio samples. It contains both natural and semi-structured utterances and is annotated using the CODA* orthography for dialectal Arabic. Using this dataset, we trained the OpenAI Whisper model with the Hugging Face Transformers, achieving a WER (Word Error Rate) reduction of 2.045 → 0.040. To assist in managing smartphone commands and having simple conversations in Tripolitanian Arabic, the ASR output is passed to a Rasa-based chatbot that is trained on intent-annotated queries. The chatbot was able to perform with 100% intent accuracy and a 0.998 entity F1-score. This modular pipeline is confirmed by evaluation results on standard ASR and NLU metrics. These findings show that it is possible to create high-performance, specific voice interfaces based on training for specific dialect inquiry through domain-adapted training, data augmentation, and system integration. Future expansions include extending the dataset to suit use in speech synthesis in the Libyan dialect and broader Libyan dialect support.