Ancient Ge’ez and Amharic Manuscript Recognition Using Deep  Learning

Desiyalew, Haregu Bizuayehu

dc.contributor.author	Desiyalew, Haregu Bizuayehu
dc.date.accessioned	2022-09-16T05:39:33Z
dc.date.available	2022-09-16T05:39:33Z
dc.date.issued	2022-07
dc.identifier.uri	http://etd.dbu.edu.et:80/handle/123456789/1095
dc.description.abstract	Nowadays, character recognition is one of the hot from the varity research areas in computer vision with its application. It is the process of extracting, detecting, and recognizing characters and converting them to a machine-readable format from document images. Document images may be handwritten or machine printed. Focusing on ancient Ethiopian Ethiopic manuscripts. Among the two forms, handwritten formats in which are written the ancient periods of Ethiopia. Those documents contained the most relevant cultural, and religious knowledge of ancient Ethiopians, but knowledge is limited in place and time to overcome this problem, and if those documents were destroyed by a human or natural disaster, we might lose the knowledge they contained. To address those problems, different scholars have conducted various studies; image digitazation and character recognition are two of them. But still, they have problems with the coresiveness of writing, inconsistancy of writing, nonuniformity of spaces between lines, words, and characters, and morphological similiarity of characters. In the study, different image processing stages were implemented using Python 3.10.4 through design science research methdology. The researcher primarly collects manuscript document images and binarizes them using OTSU's global thresholding algorithm and bi_level noise filter algorithm, which are implemented for noise filter algorithms and image segmentation, including both line, word and character level image segmentation. After image segmentation is conducted, researcher selects a total of 39,084 character images for dataset preparation from 705 image documents and from 11 different manuscript documents. This is followed by two different experiments using convolutional neural networks(CNN) and a hybrid of convolutional neural networks and bidirectional LSTM (BiLSTM) algorithms with two conditions, one with a dataset split ratio of 70:30% and the other with 80:20% with different parameters and hyperparameters. Finally, the hybrid of CNN and BiLSTM algorithms outperforms with the second condition of an 80:20 training and testing set split at an epoch of 15 and with a learning rate of 0.0001, and its result is 97.46% tranning accuracy, 90.86% of validation accuracy, and 30.1% of testing accuracy. The performance of manuscript recognition is highly influanced by morphology of characters and oversegmentation.	en_US
dc.language.iso	en	en_US
dc.subject	Convolutional Neural network, Bidirectional Long-Short Term Memory, Manuscript Recognition, Ge’ez Manuscript, Amharic Manuscript, Computer vision	en_US
dc.title	Ancient Ge’ez and Amharic Manuscript Recognition Using Deep Learning	en_US
dc.type	Thesis	en_US