Local Levenberg-Marquardt algorithm for learning feedforwad neural networks (pdf)

Article PDF cannot be displayed. You can download it here:

http://yadda.icm.edu.pl/yadda/element/bwmeta1.element.baztech-fa51989f-052c-4998-867c-a2727a11fd80/c/lski_Local_Levenberg-Marquardt_Algorithm.pdf

Local Levenberg-Marquardt algorithm for learning feedforwad neural networks

JAISCR, 2020, Vol. 10, No. 4, pp. 299 – 316 10.2478/jaiscr-2020-0020 LOCAL LEVENBERG-MARQUARDT ALGORITHM FOR LEARNING FEEDFORWAD NEURAL NETWORKS Jarosław Bilski1,∗ , Bartosz Kowalczyk1 , Alina Marchlewska2 , Jacek M. Zurada3 1 Department of Computer Engineering, Czestochowa University of Technology, al. Armii Krajowej 36, 42-200 Czȩstochowa, Poland 2 University of Social Science, Łódź, Poland and Clark University Worcester, MA, USA 3 Department Electrical and Computer Engineering, University of Louisville, Louisville, KY 40292, USA ∗ E-mail: Submitted: 21st October 2019; Accepted: 19th May 2020 Abstract This paper presents a local modiﬁcation of the Levenberg-Marquardt algorithm (LM). First, the mathematical basics of the classic LM method are shown. The classic LM algorithm is very efﬁcient for learning small neural networks. For bigger neural networks, whose computational complexity grows signiﬁcantly, it makes this method practically inefﬁcient. In order to overcome this limitation, local modiﬁcation of the LM is introduced in this paper. The main goal of this paper is to develop a more complexity efﬁcient modiﬁcation of the LM method by using a local computation. The introduced modiﬁcation has been tested on the following benchmarks: the function approximation and classiﬁcation problems. The obtained results have been compared to the classic LM method performance. The paper shows that the local modiﬁcation of the LM method signiﬁcantly improves the algorithm’s performance for bigger networks. Several possible proposals for future works are suggested. Keywords: feed-forward neural network, neural network learning algorithm, optimization problem, Levenberg-Marquardt algorithm, QR decomposition, Givens rotation. 1 Introduction Nowadays, artiﬁcial intelligence exists not only in the world of science but also in industry. One of the most interesting areas of AI are neural networks. Each year researchers across the world produce an incredible number of scientiﬁc papers whose main theme originates from AI, especially from neural networks as in [1, 2, 3, 4, 5, 6, 7, 8, 9]. Industry strongly beneﬁts from that by applying more and more advanced AI solutions in their products. The biggest industrial beneﬁciaries of the neural net- works are medicine and health care [10, 11, 12, 13, 14, 15], banking and ﬁnances [16, 17, 18], but also safety [19, 20, 21, 22, 23] and entertainment [24, 25, 26]. All applications of neural networks share the same common feature – a network needs to be trained in order to solve a speciﬁc problem. There are many training algorithms which are derived directly from the original backpropagation method [27] such as [28, 29, 30]. There are also more complex algorithms which involve Newton’s 300 Jarosław Bilski, Bartosz Kowalczyk, Alina Marchlewska, Jacek M. Żurada method, such as the Levenberg-Marquardt (LM) algorithm, which was initially proposed in [31]. The LM algorithm is a supervised training method that can be applied to any feedforward neural network, which from now on will be also referred as ”FF”. A neural network is built from layers. Each layer is built from a ﬁnite number of neurons. The last layer of the network has a special function acting as a network’s output, hence it is called the output layer. In most practical applications there are networks with more than a single layer. All layers preceding the network’s output are called hidden layers. Feedforward neural networks can have various topologies. The most common is the multilayer perceptron also called an MLP. In such networks each layer is connected only to the previous one. An exemplary MLP network is shown in Figure 1. The next common FF topology is the fully connected multilayer perceptron, simply referred to as an FCMLP. This type of network is similar to the classic MLP with additional layer connections. As shown in Figure 2 each layer of the FCMLP network maintains connections to all preceding layers. Due to that, an FCMLP network with the same neuron count will contain many more weights than a standard MLP of the same size. Additional interesting variations are fully connected cascade networks (FCCs). A network of such type is similar to the FCMLP network whose layers contain only a single neuron. Each layer is connected to all preceding layers and network inputs as shown in Figure 3. Originally, the FCC network was used by P. Werbos in his research on the backpropagation method [27]. It is worth noting that the special case of the FCC network is the FCMLP network, while the MLP network is a special case of the FCMLP network. +1 +1 +1 x1 x2 Figure 1. MLP network with 5 neurons and two hidden layers. +1 x1 x2 Figure 2. FCMLP network with 5 neurons and two hidden layers (excessive connections are marked with the dotted line). +1 x1 x2 Figure 3. FCC network with 4 neurons. While the LM algorithm is a very popular and robust method of ﬁnding the function minimum in most applications, it is still burdened with several disadvantages. Some of them are serious enough to the extent that makes the LM algorithm completely impractical. Through the years many researchers have been making attempts at devising the LM algorithm optimization techniques. The LM algorithm is a second-order method which combines the advantages of the GaussNewton and the gradient descend methods. As most of the neural networks training algorithms, the classic LM can also become stuck in the local minimum. In classical ﬁrst-order methods this problem can be solved by applying the momentum factor. Such modiﬁcation helps to overshoot the local minima and ﬁnd the right direction towards the optimal solution. The momentum factor can be selected arbitrarily and remain ﬁxed through the training or be dynamically adjusted based on the convergence process. Such approach has been presented in [32], where the main idea was to combine the advantages of the LM and CG methods in order to LOCAL LEVENBERG-MARQUARDT ALGORITHM FOR . . . 301 increase the robustness of the training. The authors made an effort to develop two variants of the momentum Levenberg-Marquardt algorithm with both, ﬁxed and adaptable momentum size. While the presented algorithms were proven to be more efﬁcient in the scope of the training time than the classic LM, they both were still burdened with the biggest of the LM disadvantages – a great computational complexity due to the size of a single Jacobian matrix. lized during the LM training. The authors have proved this technique is able to increase the stability of the LM training process, but it is not able to decrease the computational complexity. While approaching complex experiments, the classical methods of neural networks training can report a very poor convergence rate in the ﬂat spot of the error function. Typically, the ﬂat spot problem causes a signiﬁcant training slowdown due to low gradient values of the hidden neurons. In the ﬁrst-order methods such problem can be ad (...truncated)