Jun 30,2018 Scientific research & Postgraduate Studies, ICT Engineering

Computational Methods to Vocalize Arabic Texts

Author

Hani Safadi, Dr. Oumayma Dakkak, Dr. Nada Ghneim

Published in

May 2006

Abstract

Arabic Language has two kinds of vowels: Long vowels which are written as normal letters; and short vowels which are written as punctuation marks, above or below letters. Those short vowels are normally omitted in Arabic texts because the reader can fill them and

guess the meaning based on his knowledge of the language, and the context in which the words are.

However, with the widespread usage of computers in linguistics application; Arabic texts need to be supplied with short vowels in order to be analyzed. Search engines, text to speech engines, and text mining tools are just some examples of applications that need Arabic texts to be vocalized before being processed.

In this paper, we present a new method to supply those vocals. The approach emphasizes on unsupervised machine methods, because public Arabic corpora are not available. Arabic rich morphology and diverse orthography present serious challenges for this approach. An algorithm is developed and a system is implemented in Java.

The techniques presented in this dissertation can be applied to similar Semitic or other languages that have the same problem.

Link to read full paper

https://www.researchgate.net/publication/228525486_Computational_Methods_to_Vocalize_Arabic_Texts