AHWR-Net: offline handwritten amharic word recognition using convolutional recurrent neural network (pdf)

Article PDF cannot be displayed. You can download it here:

https://link.springer.com/content/pdf/10.1007/s42452-021-04742-x.pdf

AHWR-Net: offline handwritten amharic word recognition using convolutional recurrent neural network

Research Article AHWR‑Net: offline handwritten amharic word recognition using convolutional recurrent neural network Fetulhak Abdurahman1 · Eyob Sisay1 · Kinde Anlay Fante1 Received: 15 April 2021 / Accepted: 19 July 2021 © The Author(s) 2021 OPEN Abstract Amharic ( ) is the official language of the Federal Government of Ethiopia, with more than 27 million speakers. It uses an Ethiopic script, which has 238 core and 27 labialized characters. It is a low-resourced language, and a few attempts have been made so far for its handwritten text recognition. However, Amharic handwritten text recognition is challenging due to the very high similarity between characters. This paper presents a convolutional recurrent neural networks based offline handwritten Amharic word recognition system. The proposed framework comprises convolutional neural networks (CNNs) for feature extraction from input word images, recurrent neural network (RNNs) for sequence encoding, and connectionist temporal classification as a loss function. We designed a custom CNN model and compared its performance with three different state-of-the-art CNN models, including DenseNet-121, ResNet-50 and VGG-19 after modifying their architectures to fit our problem domain, for robust feature extraction from handwritten Amharic word images. We have conducted detailed experiments with different CNN and RNN architectures, input word image sizes, and applied data augmentation techniques to enhance performance of the proposed models. We have prepared a handwritten Amharic word dataset, HARD-I, which is available publicly for researchers. From the experiments on various recognition models using our dataset, a WER of 5.24 % and CER of 1.15 % were achieved using our best-performing recognition model. The proposed models achieve a competitive performance compared to existing models for offline handwritten Amharic word recognition. Keywords Amharic · CNN · CTC· Handwritten · LSTM · Recognition 1 Introduction The classic Amharic script was tailored to classical languages such as Ge’ez, and so many new signs have been derived for modern Amharic language. As a subgroup ) is one of the Ethiopian within these, Amharic ( Semitic languages which uses Ethiopic script. It is the official language of the Federal Government of Ethiopia which has more than 27 million speakers. Amharic language has 34 base characters from which six or more than six (if the base character has labialized forms) families are formed by changing the shape of the base characters (e.g. ). There is a similarity in shape between inter-family or intra-family characters, which differs with a single stroke. There are a total of 238 core characters (34 base characters with six orders representing derived vocal sounds of the base character) and 27 labialized characters (which have two sounds such as ) [1]. Amharic language alphabet, known as “Fidel” is written in tabular form with seven columns where the first column represents base characters while the remaining six or more columns represent derived characters, as * Fetulhak Abdurahman, ; Eyob Sisay, ; Kinde Anlay Fante, | 1Faculty of Electrical and Computer Engineering, JiT, Jimma University, Jimma, Ethiopia. SN Applied Sciences (2021) 3:760 | https://doi.org/10.1007/s42452-021-04742-x Vol.:(0123456789) Research Article SN Applied Sciences (2021) 3:760 | https://doi.org/10.1007/s42452-021-04742-x Fig. 1 a A sample handwritten Amharic text. b A sample Amharic characters in tabular form having 7 columns. The red circles in the first row indicate how the derived characters differ from the base character in column 1 with a single stroke. shown in Fig. 1b. Among the base characters, some of them do not have a unique sound and can be used interchangeably in written documents (such copairs are shown in braces: ). Amharic text is written like the Latin script from left to right, and words are separated by blank space and top to bottom for newlines. Unlike the Latin script, there are no lowercase and uppercase letters. A sample handwritten text is shown in Fig. 1a. Texts can be generated either in printed or handwritten form, and they have variability, which comes from various printing fonts or writing style of different individuals. Handwritten text recognition (HTR) system aims to transform images of handwritten text into editable text. Despite the remarkable development of digital text processing technology, handwriting is still used in our day-to-day life. The problem with printed and handwritten documents is its difficulty in sharing, storing, and efficiently managing the document. The conversion of handwritten documents has a great significance in various application areas, including preserving historical heritage by converting historical documents, bank check processing, postal address sorting, and many more. HTR can be designed either for online or offline recognition. Concisely, online HTR is done at the time of writing where temporal features can be captured from both pen trajectory and the resulting image, whereas offline HTR is done only from a scanned image of a handwritten document after writing is completed. HTR is a challenging task due to the cursive nature of hand-writings, the complicated background of text images, myriad different writing styles (the calligraphy) and other language-related issues. Specifically, the high similarity in shape between Vol:.(1234567890) inter-family or intra-family characters in Amharic language makes the HTR task extremely challenging. Researchers have been working for several decades on the development of HTR systems. In traditional HTR approaches, different image processing techniques were applied for segmenting the handwritten document into lines, words or characters. After segmentation, hand-engineered features were extracted and hidden Markov models (HMMs) [2, 3] were applied as sequence learning algorithms to represent the output as a character sequence. However, HMMs have limitation due to the markovian assumption, which learns text context information only from the current state, making it challenging to model contextual effects. The advancement of deep learning algorithms has led us to use them for handwritten recognition tasks. Recurrent neural networks (RNNs) and multidimensional RNNs (MDRNNs) can overcome the limitations of traditional HMM-based sequence learning models. Despite their ability to handle the markovian assumptions, conventional RNNs require segmented input at each time step due to the behavior of their loss function. Researchers attempted to solve this problem using a hybrid model by integrating HMMs with RNNs [4]. The HMMs in hybrid models inhibit the usage of RNNs to their full potential. The introduction of connectionist temporal classification (CTC) algorithm [5] eliminates the need for segmented inputs in traditional RNN based models by decoding output sequences without a one-to-one correspondence between input sequences. Even (...truncated)