Local Levenberg-Marquardt algorithm for learning feedforwad neural networks
JAISCR, 2020, Vol. 10, No. 4, pp. 299 – 316
10.2478/jaiscr-2020-0020
LOCAL LEVENBERG-MARQUARDT ALGORITHM FOR
LEARNING FEEDFORWAD NEURAL NETWORKS
Jarosław Bilski1,∗ , Bartosz Kowalczyk1 , Alina Marchlewska2 , Jacek M. Zurada3
1 Department of Computer Engineering, Czestochowa University of Technology,
al. Armii Krajowej 36, 42-200 Czȩstochowa, Poland
2 University of Social Science, Łódź, Poland
and Clark University Worcester, MA, USA
3 Department Electrical and Computer Engineering,
University of Louisville, Louisville, KY 40292, USA
∗ E-mail:
Submitted: 21st October 2019; Accepted: 19th May 2020
Abstract
This paper presents a local modification of the Levenberg-Marquardt algorithm (LM).
First, the mathematical basics of the classic LM method are shown. The classic LM algorithm is very efficient for learning small neural networks. For bigger neural networks,
whose computational complexity grows significantly, it makes this method practically inefficient. In order to overcome this limitation, local modification of the LM is introduced
in this paper. The main goal of this paper is to develop a more complexity efficient modification of the LM method by using a local computation. The introduced modification
has been tested on the following benchmarks: the function approximation and classification problems. The obtained results have been compared to the classic LM method
performance. The paper shows that the local modification of the LM method significantly
improves the algorithm’s performance for bigger networks. Several possible proposals
for future works are suggested.
Keywords: feed-forward neural network, neural network learning algorithm, optimization problem, Levenberg-Marquardt algorithm, QR decomposition, Givens rotation.
1 Introduction
Nowadays, artificial intelligence exists not only
in the world of science but also in industry. One of
the most interesting areas of AI are neural networks.
Each year researchers across the world produce an
incredible number of scientific papers whose main
theme originates from AI, especially from neural
networks as in [1, 2, 3, 4, 5, 6, 7, 8, 9]. Industry
strongly benefits from that by applying more and
more advanced AI solutions in their products. The
biggest industrial beneficiaries of the neural net-
works are medicine and health care [10, 11, 12,
13, 14, 15], banking and finances [16, 17, 18],
but also safety [19, 20, 21, 22, 23] and entertainment [24, 25, 26].
All applications of neural networks share the
same common feature – a network needs to be
trained in order to solve a specific problem.
There are many training algorithms which are derived directly from the original backpropagation
method [27] such as [28, 29, 30]. There are also
more complex algorithms which involve Newton’s
300
Jarosław Bilski, Bartosz Kowalczyk, Alina Marchlewska, Jacek M. Żurada
method, such as the Levenberg-Marquardt (LM) algorithm, which was initially proposed in [31].
The LM algorithm is a supervised training
method that can be applied to any feedforward neural network, which from now on will be also referred as ”FF”. A neural network is built from layers. Each layer is built from a finite number of
neurons. The last layer of the network has a special function acting as a network’s output, hence it
is called the output layer. In most practical applications there are networks with more than a single layer. All layers preceding the network’s output are called hidden layers. Feedforward neural
networks can have various topologies. The most
common is the multilayer perceptron also called an
MLP. In such networks each layer is connected only
to the previous one. An exemplary MLP network is
shown in Figure 1. The next common FF topology
is the fully connected multilayer perceptron, simply referred to as an FCMLP. This type of network
is similar to the classic MLP with additional layer
connections. As shown in Figure 2 each layer of
the FCMLP network maintains connections to all
preceding layers. Due to that, an FCMLP network
with the same neuron count will contain many more
weights than a standard MLP of the same size. Additional interesting variations are fully connected
cascade networks (FCCs). A network of such type
is similar to the FCMLP network whose layers contain only a single neuron. Each layer is connected
to all preceding layers and network inputs as shown
in Figure 3. Originally, the FCC network was used
by P. Werbos in his research on the backpropagation
method [27]. It is worth noting that the special case
of the FCC network is the FCMLP network, while
the MLP network is a special case of the FCMLP
network.
+1
+1
+1
x1
x2
Figure 1. MLP network with 5 neurons and two
hidden layers.
+1
x1
x2
Figure 2. FCMLP network with 5 neurons and two
hidden layers (excessive connections are marked
with the dotted line).
+1
x1
x2
Figure 3. FCC network with 4 neurons.
While the LM algorithm is a very popular and
robust method of finding the function minimum in
most applications, it is still burdened with several
disadvantages. Some of them are serious enough to
the extent that makes the LM algorithm completely
impractical. Through the years many researchers
have been making attempts at devising the LM algorithm optimization techniques.
The LM algorithm is a second-order method
which combines the advantages of the GaussNewton and the gradient descend methods. As most
of the neural networks training algorithms, the classic LM can also become stuck in the local minimum. In classical first-order methods this problem can be solved by applying the momentum factor. Such modification helps to overshoot the local minima and find the right direction towards the
optimal solution. The momentum factor can be selected arbitrarily and remain fixed through the training or be dynamically adjusted based on the convergence process. Such approach has been presented
in [32], where the main idea was to combine the
advantages of the LM and CG methods in order to
LOCAL LEVENBERG-MARQUARDT ALGORITHM FOR . . .
301
increase the robustness of the training. The authors
made an effort to develop two variants of the momentum Levenberg-Marquardt algorithm with both,
fixed and adaptable momentum size. While the presented algorithms were proven to be more efficient
in the scope of the training time than the classic LM,
they both were still burdened with the biggest of
the LM disadvantages – a great computational complexity due to the size of a single Jacobian matrix.
lized during the LM training. The authors have
proved this technique is able to increase the stability of the LM training process, but it is not able to
decrease the computational complexity.
While approaching complex experiments, the
classical methods of neural networks training can
report a very poor convergence rate in the flat spot
of the error function. Typically, the flat spot problem causes a significant training slowdown due to
low gradient values of the hidden neurons. In the
first-order methods such problem can be ad (...truncated)