Moving Learning Machine towards Fast Real-Time Applications: A High-Speed FPGA-Based Implementation of the OS-ELM Training Algorithm
electronics
Article
Moving Learning Machine towards Fast Real-Time
Applications: A High-Speed FPGA-Based
Implementation of the OS-ELM Training Algorithm
Jose V. Frances-Villora †, * , Alfredo Rosado-Muñoz † , Manuel Bataller-Mompean † ,
Juan Barrios-Aviles † and Juan F. Guerrero-Martinez †
Processing and Digital Design Group, Department of Electronic Engineering, University of Valencia,
46100 Burjassot, Spain; (A.R.-M.); (M.B.-M.);
(J.B.-A.); (J.F.G.-M.)
* Correspondence:
† These authors contributed equally to this work.
Received: 19 October 2018; Accepted: 5 November 2018; Published: 7 November 2018
Abstract: Currently, there are some emerging online learning applications handling data streams
in real-time. The On-line Sequential Extreme Learning Machine (OS-ELM) has been successfully
used in real-time condition prediction applications because of its good generalization performance
at an extreme learning speed, but the number of trainings by a second (training frequency)
achieved in these continuous learning applications has to be further reduced. This paper proposes
a performance-optimized implementation of the OS-ELM training algorithm when it is applied to
real-time applications. In this case, the natural way of feeding the training of the neural network is
one-by-one, i.e., training the neural network for each new incoming training input vector. Applying
this restriction, the computational needs are drastically reduced. An FPGA-based implementation of
the tailored OS-ELM algorithm is used to analyze, in a parameterized way, the level of optimization
achieved. We observed that the tailored algorithm drastically reduces the number of clock cycles
consumed for the training execution up to approximately the 1%. This performance enables high-speed
sequential training ratios, such as 14 KHz of sequential training frequency for a 40 hidden neurons SLFN,
or 180 Hz of sequential training frequency for a 500 hidden neurons SLFN. In practice, the proposed
implementation computes the training almost 100 times faster, or more, than other applications in
the bibliography. Besides, clock cycles follows a quadratic complexity O( Ñ 2 ), with Ñ the number
of hidden neurons, and are poorly influenced by the number of input neurons. However, it shows
a pronounced sensitivity to data type precision even facing small-size problems, which force to use
double floating-point precision data types to avoid finite precision arithmetic effects. In addition, it has
been found that distributed memory is the limiting resource and, thus, it can be stated that current
FPGA devices can support OS-ELM-based on-chip learning of up to 500 hidden neurons. Concluding,
the proposed hardware implementation of the OS-ELM offers great possibilities for on-chip learning in
portable systems and real-time applications where frequent and fast training is required.
Keywords: online sequential ELM; OS-ELM; FPGA; on-chip training; on-line learning; real-time
learning; hardware implementation; extreme learning machine
1. Introduction
There is a current trend to implement hardware on-chip learning for applications such as facial
recognition, pattern recognition and complex learning behaviors. As an example, ref. [1] used real-time
sequential learning in mobile devices for face recognition applications; ref. [2] proposed a real-time
Electronics 2018, 7, 308; doi:10.3390/electronics7110308
www.mdpi.com/journal/electronics
Electronics 2018, 7, 308
2 of 23
learning of neural networks for the prediction of future opponent robot coordinates; ref. [3] designed
an ASIC on-chip learning to learn and extract features existing in input datasets, intended to embedded
vision applications; or [4], that implemented a real-time classifier for neurological signals.
The Extreme Learning Machine (ELM) algorithm possesses many aspects that makes it suitable for
any real-time or custom hardware implementation. It has a reduced and fixed training time along with
an extremely fast learning speed that allows determinism in the computation time and, thus, a great
advantage compared to previous well-known training methods as gradient descent [5]. The ELM
algorithm is based on Single Layer Feedforward Neural Network (SLFN), using random hidden layer
weights and a linear adjustment for the output layer [6–8]. The result is a simple training procedure
that has been applied to a wide range of applications as electricity price prediction [9], prediction
of energy consumption [10], power disaggregation [11], soldering inspection [12], computation of
friction [13], non-linear control [14], fiber optic communications [15], or epileptic EEG detection [16].
However, the ELM algorithm is essentially a batch learning method usually running under PC, and
only some approaches use it on real-time hardware to compute the on-line working flow, as in [17]
where an embedded FPGA estimated the speed for a drive system.
Liang et al. [18] proposed a modified version of the ELM, namely On-line Sequential ELM
(OS-ELM), best suited to handle incremental datasets, which is the most natural way of learning in
real-time contexts. This learning algorithm keeps the reduced and training time of the original ELM,
allowing determinist computation time along other prominent features as: very fast adaptation and
convergence speed, acceptance of input chunks of different sizes, high generalization capability, good
accuracy, high structural flexibility and only one operating parameter, the number of hidden nodes.
Diverse OS-ELM sequential learning applications have been proposed to date. As an example,
ref. [19] adapted an automatic gesture recognition model to new users, getting high recognition
accuracy. In a Wi-Fi based indoor positioning application, ref. [20] addressed the problem of obtaining
an adaption, in a timely manner, to environmental dynamics; ref. [21] addressed the problem of
overcoming the fluctuation problem, and [22] handled the dimension changing problem caused by
the increase or decrease of the number of APs (Access Points). In [23], they developed a robust
safety-oriented autonomous cruise control based on the Model Predictive Control (MPC) technique;
ref. [24] addressed the pedestrian dead-reackoning problem at indoor localization; ref. [25] addressed
the problem of detecting attacks in the advanced metering infrastructure of a smart grid; and [26] used
OS-ELM to propose an algorithm for facial expression recognition. It can be stated that, nowadays,
OS-ELM is used to handle either sequential arrival of data, or large amounts of data.
However, there are currently emerging online learning applications which need real-time handling
of data streams. These applications use the OS-ELM in the strict real-time sense. As an example,
Chen et al. [27] used an ensemble of OS-ELMs and phase space reconstruction to recognize different
types of flow oscillations and accurately forecast the trend of monitored plant variables. It was (...truncated)