Connecting Arabs: Bridging the Gap in Dialectal Speech Recognition
|
Researchers |
Ali Ahmed, Shammur Chowdhury, Mohamed Afify, Wassim El-Hajj, Hazem Hajj, Mourad Abbas, Amir Hussein, Nada Ghneim, Mohammad Abushariah, and Assal Alqudah |
|
Published in |
Communications of the ACM, Volume 64, No. 4, pages 124-129, April 2021. |
|
Abstract |
Automatic Speech Recognition refers to the process through which speech is converted into text. Over the decades, automatic speech recognition has achieved many milestones, thanks to advances in machine learning and low-cost computer hardware. As a result, the best systems for English have achieved a single-digit word error rate (WER) and, in some conversational tasks, performance is comparable to human transcribers. This led researchers to debate whether the machine has reached human parity in speech recognition. Unlike English, speech recognition in Arabic faces many challenges, even with such advanced techniques. Arabic poses a set of unique challenges due to its rich dialectal variety, with modern standard Arabic (MSA) being the only standardized dialect. MSA is syntactically, morphologically, and phonologically grounded on classical Arabic, the language of the Qur’an (Islam’s Holy Book). Lexically, however, it is much more modern. MSA is taught in schools across the Arab region and is the main language in news broadcasts, parliaments, and formal speech. This is one of the main reasons why MSA has been the main choice for speech and language technology for the last two decades. The current WER for MSA automatic speech recognition (ASR) is about 13%, and is worse for dialectal ASR, where the WER averages 30%. Keywords: Arabic dialect, Automatic speech recognition. |
|
Link to read full paper |
https://cacm.acm.org/magazines/2021/4/251361-connecting-arabs/fulltext |