AHWR-Net: offline handwritten amharic word recognition using convolutional recurrent neural network
Research Article
AHWR‑Net: offline handwritten amharic word recognition using
convolutional recurrent neural network
Fetulhak Abdurahman1
· Eyob Sisay1 · Kinde Anlay Fante1
Received: 15 April 2021 / Accepted: 19 July 2021
© The Author(s) 2021 OPEN
Abstract
Amharic (
) is the official language of the Federal Government of Ethiopia, with more than 27 million speakers. It
uses an Ethiopic script, which has 238 core and 27 labialized characters. It is a low-resourced language, and a few attempts
have been made so far for its handwritten text recognition. However, Amharic handwritten text recognition is challenging due to the very high similarity between characters. This paper presents a convolutional recurrent neural networks
based offline handwritten Amharic word recognition system. The proposed framework comprises convolutional neural
networks (CNNs) for feature extraction from input word images, recurrent neural network (RNNs) for sequence encoding, and connectionist temporal classification as a loss function. We designed a custom CNN model and compared its
performance with three different state-of-the-art CNN models, including DenseNet-121, ResNet-50 and VGG-19 after
modifying their architectures to fit our problem domain, for robust feature extraction from handwritten Amharic word
images. We have conducted detailed experiments with different CNN and RNN architectures, input word image sizes,
and applied data augmentation techniques to enhance performance of the proposed models. We have prepared a
handwritten Amharic word dataset, HARD-I, which is available publicly for researchers. From the experiments on various recognition models using our dataset, a WER of 5.24 % and CER of 1.15 % were achieved using our best-performing
recognition model. The proposed models achieve a competitive performance compared to existing models for offline
handwritten Amharic word recognition.
Keywords Amharic · CNN · CTC· Handwritten · LSTM · Recognition
1 Introduction
The classic Amharic script was tailored to classical languages such as Ge’ez, and so many new signs have been
derived for modern Amharic language. As a subgroup
) is one of the Ethiopian
within these, Amharic (
Semitic languages which uses Ethiopic script. It is the official language of the Federal Government of Ethiopia which
has more than 27 million speakers.
Amharic language has 34 base characters from which
six or more than six (if the base character has labialized
forms) families are formed by changing the shape of the
base characters (e.g.
). There is a similarity
in shape between inter-family or intra-family characters,
which differs with a single stroke. There are a total of 238
core characters (34 base characters with six orders representing derived vocal sounds of the base character) and
27 labialized characters (which have two sounds such as
) [1]. Amharic language alphabet, known as “Fidel”
is written in tabular form with seven columns where the
first column represents base characters while the remaining six or more columns represent derived characters, as
* Fetulhak Abdurahman, ; Eyob Sisay, ; Kinde Anlay Fante, | 1Faculty
of Electrical and Computer Engineering, JiT, Jimma University, Jimma, Ethiopia.
SN Applied Sciences
(2021) 3:760
| https://doi.org/10.1007/s42452-021-04742-x
Vol.:(0123456789)
Research Article
SN Applied Sciences
(2021) 3:760
| https://doi.org/10.1007/s42452-021-04742-x
Fig. 1 a A sample handwritten Amharic text. b A sample Amharic characters in tabular form having 7 columns. The red circles in the first row
indicate how the derived characters differ from the base character in column 1 with a single stroke.
shown in Fig. 1b. Among the base characters, some of
them do not have a unique sound and can be used interchangeably in written documents (such copairs are shown
in braces:
). Amharic text is written like the
Latin script from left to right, and words are separated by
blank space and top to bottom for newlines. Unlike the
Latin script, there are no lowercase and uppercase letters.
A sample handwritten text is shown in Fig. 1a.
Texts can be generated either in printed or handwritten
form, and they have variability, which comes from various printing fonts or writing style of different individuals.
Handwritten text recognition (HTR) system aims to transform images of handwritten text into editable text. Despite
the remarkable development of digital text processing
technology, handwriting is still used in our day-to-day
life. The problem with printed and handwritten documents is its difficulty in sharing, storing, and efficiently
managing the document. The conversion of handwritten
documents has a great significance in various application
areas, including preserving historical heritage by converting historical documents, bank check processing, postal
address sorting, and many more.
HTR can be designed either for online or offline recognition. Concisely, online HTR is done at the time of writing where temporal features can be captured from both
pen trajectory and the resulting image, whereas offline
HTR is done only from a scanned image of a handwritten
document after writing is completed. HTR is a challenging task due to the cursive nature of hand-writings, the
complicated background of text images, myriad different
writing styles (the calligraphy) and other language-related
issues. Specifically, the high similarity in shape between
Vol:.(1234567890)
inter-family or intra-family characters in Amharic language
makes the HTR task extremely challenging.
Researchers have been working for several decades
on the development of HTR systems. In traditional HTR
approaches, different image processing techniques were
applied for segmenting the handwritten document into
lines, words or characters. After segmentation, hand-engineered features were extracted and hidden Markov models (HMMs) [2, 3] were applied as sequence learning algorithms to represent the output as a character sequence.
However, HMMs have limitation due to the markovian
assumption, which learns text context information only
from the current state, making it challenging to model
contextual effects.
The advancement of deep learning algorithms has led
us to use them for handwritten recognition tasks. Recurrent neural networks (RNNs) and multidimensional RNNs
(MDRNNs) can overcome the limitations of traditional
HMM-based sequence learning models. Despite their ability to handle the markovian assumptions, conventional
RNNs require segmented input at each time step due to
the behavior of their loss function. Researchers attempted
to solve this problem using a hybrid model by integrating
HMMs with RNNs [4]. The HMMs in hybrid models inhibit
the usage of RNNs to their full potential. The introduction
of connectionist temporal classification (CTC) algorithm
[5] eliminates the need for segmented inputs in traditional
RNN based models by decoding output sequences without
a one-to-one correspondence between input sequences.
Even (...truncated)