Analysis of the ORTHOTEL Corpus: The Contribution of Automatic Treatment to the Classification of Spelling Errors
|
Author |
Véronique Aubergé, Nada Ghneim, Rahia Belrhali |
|
Published in |
Langue française, 124(1):90-103, December 1999 |
|
Abstract |
The aim of this study is to present organized statistical data extracted from a large corpus of 15,000 forms showing spelling errors. This corpus, ORTHOTEL, is the result of Minitel users wondering about word spelling. An automatic treatment has been applied to the corpus to separate and analyse errors. Half of the forms of the corpus are rightly spelled. It indicates the users' degree of linguistic insecurity. An automatic text-to-phone system applied on the badly spelled words shows that a great part are homophone to a correct word taken from a reference lexicon of 80,000 canonical forms. An alignment algorithm has classified the orthographic transformations which account for deviations from the reference lexicon. |
|
Link to read full paper |
https://doi.org/10.3406/lfr.1999.6308 |