Improving optimization of convolutional neural networks through parameter fine-tuning

Neural Computing and Applications, Nov 2017

In recent years, convolutional neural networks have achieved state-of-the-art performance in a number of computer vision problems such as image classification. Prior research has shown that a transfer learning technique known as parameter fine-tuning wherein a network is pre-trained on a different dataset can boost the performance of these networks. However, the topic of identifying the best source dataset and learning strategy for a given target domain is largely unexplored. Thus, this research presents and evaluates various transfer learning methods for fine-grained image classification as well as the effect on ensemble networks. The results clearly demonstrate the effectiveness of parameter fine-tuning over random initialization. We find that training should not be reduced after transferring weights, larger, more similar networks tend to be the best source task, and parameter fine-tuning can often outperform randomly initialized ensembles. The experimental framework and findings will help to train models with improved accuracy.

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:

https://link.springer.com/content/pdf/10.1007%2Fs00521-017-3285-0.pdf

Improving optimization of convolutional neural networks through parameter fine-tuning

Improving optimization of convolutional neural networks through parameter fine-tuning Nicholas Becherer 0 1 2 3 John Pecarina 0 1 2 3 Scott Nykl 0 1 2 3 Kenneth Hopkinson 0 1 2 3 0 Nicholas Becherer 1 & John Pecarina 2 Air Force Institute of Technology , Dayton, OH 45433 , USA 3 Kenneth Hopkinson In recent years, convolutional neural networks have achieved state-of-the-art performance in a number of computer vision problems such as image classification. Prior research has shown that a transfer learning technique known as parameter finetuning wherein a network is pre-trained on a different dataset can boost the performance of these networks. However, the topic of identifying the best source dataset and learning strategy for a given target domain is largely unexplored. Thus, this research presents and evaluates various transfer learning methods for fine-grained image classification as well as the effect on ensemble networks. The results clearly demonstrate the effectiveness of parameter fine-tuning over random initialization. We find that training should not be reduced after transferring weights, larger, more similar networks tend to be the best source task, and parameter fine-tuning can often outperform randomly initialized ensembles. The experimental framework and findings will help to train models with improved accuracy. Convolutional neural networks; Transfer learning; Computer vision; Parameter fine-tuning 1 Introduction Convolutional neural networks (CNNs) are machine learning models that extend the traditional artificial neural network by adding increased depth and additional constraints to the early layers. Recent work has focused on tuning their architecture to achieve maximum performance on benchmarks such as the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) [ 1, 2 ]. CNNs are not a new topic in the field of computer vision. They can trace their origins back to the early 1980s with Fukushima’s Neocognitron [ 4 ]. More directly, they were shown to be highly effective in the 1990s when used for handwritten digit recognition and eventually in industry for automated check readers [ 5, 6 ]. They rely on several successive convolutional layers to extract information from an image. Since convolution is a shift and slide operation, it is invariant to translations in the data. Most importantly, these convolutional layers are fully learnable through the backprop algorithm, meaning they can identify low- and high-level patterns through supervised training [7]. However, they fell out of favor in the new millennium because of their difficulty scaling to larger problems [ 8 ]. Problems beyond the optical character recognition or low-resolution imagery were either too computationally expensive or lacked enough training data to avoid overfitting. Recently, they have stepped back into the spotlight as these problems have been overcome. In 2012, Krizhevsky et al. [ 1 ] leveraged several recent advances to overcome these issues in the 2012 ILSVRC. First, they used NVIDIA’s CUDA programming language to implement their CNN on a highly parallel GPU, reducing run time by orders of magnitude [ 9 ]. Second, the ImageNet competition included a dataset on the scale of millions of images automatically sourced from the Internet [ 10 ]. Combined with several new techniques such as dropout regularization [ 11 ] and simple data augmentation, they presented a model dubbed AlexNet that won the competition. Since the introduction of AlexNet, the winning entries for the ImageNet competition have all been CNNs [ 2, 12, 13 ]. These newer CNNs have largely advanced the field by making the basic CNN architecture deeper. The Oxford Visual Geometry Group’s Visual Geometry Group (VGG) network experimented with 11–19 learnable weight layers, finding that 19 was the optimal architecture [ 13 ]. The current leading network, GoogLeNet, has 6.7 million learnable weights across 22 layers [ 2 ]. Others have focused on improving the performance of CNNs through data augmentation and training techniques [ 14 ]. Yet ultimately, all techniques still required large amounts of training data to be effective. The issue compelling the need for large amounts of data is due to the fact that CNN training is an extremely complex optimization problem. They typically use stochastic gradient descent (SGD) to find the minima for a loss function and this technique uses large labeled training datasets to minimize effectively. Since SGD is a greedy method, it is not guaranteed to find the global minima. This means that initialization can have an effect on the final outcome. These weights are usually initialized by sampling from a Gaussian distribution [ 15 ]. However, it has been shown that a transfer learning technique known as parameter fine-tuning can improve the performance of a CNN compared to random initialization (sometimes to a substantial degree) [ 3 ]. However, there is a transfer learning technique that ca (...truncated)


This is a preview of a remote PDF: https://link.springer.com/content/pdf/10.1007%2Fs00521-017-3285-0.pdf

Nicholas Becherer, John Pecarina, Scott Nykl, Kenneth Hopkinson. Improving optimization of convolutional neural networks through parameter fine-tuning, Neural Computing and Applications, 2017, pp. 1-11, DOI: 10.1007/s00521-017-3285-0