Moving Learning Machine towards Fast Real-Time Applications: A High-Speed FPGA-Based Implementation of the OS-ELM Training Algorithm

Electronics, Nov 2018

Currently, there are some emerging online learning applications handling data streams in real-time. The On-line Sequential Extreme Learning Machine (OS-ELM) has been successfully used in real-time condition prediction applications because of its good generalization performance at an extreme learning speed, but the number of trainings by a second (training frequency) achieved in these continuous learning applications has to be further reduced. This paper proposes a performance-optimized implementation of the OS-ELM training algorithm when it is applied to real-time applications. In this case, the natural way of feeding the training of the neural network is one-by-one, i.e., training the neural network for each new incoming training input vector. Applying this restriction, the computational needs are drastically reduced. An FPGA-based implementation of the tailored OS-ELM algorithm is used to analyze, in a parameterized way, the level of optimization achieved. We observed that the tailored algorithm drastically reduces the number of clock cycles consumed for the training execution up to approximately the 1%. This performance enables high-speed sequential training ratios, such as 14 KHz of sequential training frequency for a 40 hidden neurons SLFN, or 180 Hz of sequential training frequency for a 500 hidden neurons SLFN. In practice, the proposed implementation computes the training almost 100 times faster, or more, than other applications in the bibliography. Besides, clock cycles follows a quadratic complexity O ( N ˜ 2 ) , with N ˜ the number of hidden neurons, and are poorly influenced by the number of input neurons. However, it shows a pronounced sensitivity to data type precision even facing small-size problems, which force to use double floating-point precision data types to avoid finite precision arithmetic effects. In addition, it has been found that distributed memory is the limiting resource and, thus, it can be stated that current FPGA devices can support OS-ELM-based on-chip learning of up to 500 hidden neurons. Concluding, the proposed hardware implementation of the OS-ELM offers great possibilities for on-chip learning in portable systems and real-time applications where frequent and fast training is required.

Article PDF cannot be displayed. You can download it here:

https://www.mdpi.com/2079-9292/7/11/308/pdf

Moving Learning Machine towards Fast Real-Time Applications: A High-Speed FPGA-Based Implementation of the OS-ELM Training Algorithm

electronics Article Moving Learning Machine towards Fast Real-Time Applications: A High-Speed FPGA-Based Implementation of the OS-ELM Training Algorithm Jose V. Frances-Villora †, * , Alfredo Rosado-Muñoz † , Manuel Bataller-Mompean † , Juan Barrios-Aviles † and Juan F. Guerrero-Martinez † Processing and Digital Design Group, Department of Electronic Engineering, University of Valencia, 46100 Burjassot, Spain; (A.R.-M.); (M.B.-M.); (J.B.-A.); (J.F.G.-M.) * Correspondence: † These authors contributed equally to this work. Received: 19 October 2018; Accepted: 5 November 2018; Published: 7 November 2018   Abstract: Currently, there are some emerging online learning applications handling data streams in real-time. The On-line Sequential Extreme Learning Machine (OS-ELM) has been successfully used in real-time condition prediction applications because of its good generalization performance at an extreme learning speed, but the number of trainings by a second (training frequency) achieved in these continuous learning applications has to be further reduced. This paper proposes a performance-optimized implementation of the OS-ELM training algorithm when it is applied to real-time applications. In this case, the natural way of feeding the training of the neural network is one-by-one, i.e., training the neural network for each new incoming training input vector. Applying this restriction, the computational needs are drastically reduced. An FPGA-based implementation of the tailored OS-ELM algorithm is used to analyze, in a parameterized way, the level of optimization achieved. We observed that the tailored algorithm drastically reduces the number of clock cycles consumed for the training execution up to approximately the 1%. This performance enables high-speed sequential training ratios, such as 14 KHz of sequential training frequency for a 40 hidden neurons SLFN, or 180 Hz of sequential training frequency for a 500 hidden neurons SLFN. In practice, the proposed implementation computes the training almost 100 times faster, or more, than other applications in the bibliography. Besides, clock cycles follows a quadratic complexity O( Ñ 2 ), with Ñ the number of hidden neurons, and are poorly influenced by the number of input neurons. However, it shows a pronounced sensitivity to data type precision even facing small-size problems, which force to use double floating-point precision data types to avoid finite precision arithmetic effects. In addition, it has been found that distributed memory is the limiting resource and, thus, it can be stated that current FPGA devices can support OS-ELM-based on-chip learning of up to 500 hidden neurons. Concluding, the proposed hardware implementation of the OS-ELM offers great possibilities for on-chip learning in portable systems and real-time applications where frequent and fast training is required. Keywords: online sequential ELM; OS-ELM; FPGA; on-chip training; on-line learning; real-time learning; hardware implementation; extreme learning machine 1. Introduction There is a current trend to implement hardware on-chip learning for applications such as facial recognition, pattern recognition and complex learning behaviors. As an example, ref. [1] used real-time sequential learning in mobile devices for face recognition applications; ref. [2] proposed a real-time Electronics 2018, 7, 308; doi:10.3390/electronics7110308 www.mdpi.com/journal/electronics Electronics 2018, 7, 308 2 of 23 learning of neural networks for the prediction of future opponent robot coordinates; ref. [3] designed an ASIC on-chip learning to learn and extract features existing in input datasets, intended to embedded vision applications; or [4], that implemented a real-time classifier for neurological signals. The Extreme Learning Machine (ELM) algorithm possesses many aspects that makes it suitable for any real-time or custom hardware implementation. It has a reduced and fixed training time along with an extremely fast learning speed that allows determinism in the computation time and, thus, a great advantage compared to previous well-known training methods as gradient descent [5]. The ELM algorithm is based on Single Layer Feedforward Neural Network (SLFN), using random hidden layer weights and a linear adjustment for the output layer [6–8]. The result is a simple training procedure that has been applied to a wide range of applications as electricity price prediction [9], prediction of energy consumption [10], power disaggregation [11], soldering inspection [12], computation of friction [13], non-linear control [14], fiber optic communications [15], or epileptic EEG detection [16]. However, the ELM algorithm is essentially a batch learning method usually running under PC, and only some approaches use it on real-time hardware to compute the on-line working flow, as in [17] where an embedded FPGA estimated the speed for a drive system. Liang et al. [18] proposed a modified version of the ELM, namely On-line Sequential ELM (OS-ELM), best suited to handle incremental datasets, which is the most natural way of learning in real-time contexts. This learning algorithm keeps the reduced and training time of the original ELM, allowing determinist computation time along other prominent features as: very fast adaptation and convergence speed, acceptance of input chunks of different sizes, high generalization capability, good accuracy, high structural flexibility and only one operating parameter, the number of hidden nodes. Diverse OS-ELM sequential learning applications have been proposed to date. As an example, ref. [19] adapted an automatic gesture recognition model to new users, getting high recognition accuracy. In a Wi-Fi based indoor positioning application, ref. [20] addressed the problem of obtaining an adaption, in a timely manner, to environmental dynamics; ref. [21] addressed the problem of overcoming the fluctuation problem, and [22] handled the dimension changing problem caused by the increase or decrease of the number of APs (Access Points). In [23], they developed a robust safety-oriented autonomous cruise control based on the Model Predictive Control (MPC) technique; ref. [24] addressed the pedestrian dead-reackoning problem at indoor localization; ref. [25] addressed the problem of detecting attacks in the advanced metering infrastructure of a smart grid; and [26] used OS-ELM to propose an algorithm for facial expression recognition. It can be stated that, nowadays, OS-ELM is used to handle either sequential arrival of data, or large amounts of data. However, there are currently emerging online learning applications which need real-time handling of data streams. These applications use the OS-ELM in the strict real-time sense. As an example, Chen et al. [27] used an ensemble of OS-ELMs and phase space reconstruction to recognize different types of flow oscillations and accurately forecast the trend of monitored plant variables. It was (...truncated)


This is a preview of a remote PDF: https://www.mdpi.com/2079-9292/7/11/308/pdf
Article home page: https://doaj.org/article/4a5fd0d7b1ea48cd83123ad3648292e3

Jose V.  Frances-Villora, Alfredo Rosado-Muñoz, Manuel  Bataller-Mompean, Juan  Barrios-Aviles, Juan F.  Guerrero-Martinez. Moving Learning Machine towards Fast Real-Time Applications: A High-Speed FPGA-Based Implementation of the OS-ELM Training Algorithm, Electronics, 2018, pp. 308, Volume 11, DOI: 10.3390/electronics7110308