Machine Learning in Document Analysis and Recognition

Machine Learning in Document Analysis and Recognition

Author
Simone Marinai, Hiromichi Fujisawa
Publication Year
2008
Publisher
Springer
Language
English
Document Type
Book
Faculty / Subject Heading
Engineering

The objective of Document Analysis and Recognition (DAR) is to recognize the text and graphicalcomponents of a document and to extract information. With ?rst papers dating back to the 1960’s, DAR is a mature but still gr- ing research?eld with consolidated and known techniques. Optical Character Recognition (OCR) engines are some of the most widely recognized pr- ucts of the research in this ?eld, while broader DAR techniques are nowadays studied and applied to other industrial and o?ce automation systems. In the machine learning community, one of the most widely known - search problems addressed in DAR is recognition of unconstrained handwr- ten characters which has been frequently used in the past as a benchmark for evaluating machine learning algorithms, especially supervised classi?ers.


Keywords: Engineering / Document Image Analysis and Recognition (DIAR) / Learning Strategies / Algorithm / Algorithms / Calculus / Classification / Cognition / Handwriting recognition / Image analysis / Layout / Learning / Machine learning / Neural networks / Self-organizing map / Verification