Jun 30,2018 Scientific research & Postgraduate Studies, ICT Engineering

Analysis of the ORTHOTEL Corpus: The Contribution of Automatic Treatment to the Classification of Spelling Errors

Author

Véronique Aubergé, Nada Ghneim, Rahia Belrhali

Published in

Langue française, 124(1):90-103, December 1999

Abstract

The aim of this study is to present organized statistical data extracted from a large corpus of 15,000 forms showing spelling errors. This corpus, ORTHOTEL, is the result of Minitel users wondering about word spelling. An automatic treatment has been applied to the corpus to separate and analyse errors. Half of the forms of the corpus are rightly spelled. It indicates the users' degree of linguistic insecurity. An automatic text-to-phone system applied on the badly spelled words shows that a great part are homophone to a correct word taken from a reference lexicon of 80,000 canonical forms. An alignment algorithm has classified the orthographic transformations which account for deviations from the reference lexicon.

Link to read full paper

https://doi.org/10.3406/lfr.1999.6308