Research on driverless vehicle vision algorithm
MATEC Web of Conferences 336, 06001 (2021)
CSCNS2020
https://doi.org/10.1051/matecconf/202133606001
Research on driverless vehicle vision algorithm
Xinchao Liu1,*, Ying Yan1, and Haiyun Gan1
1
Tianjin University of Technology and Education, Tianjin 300222, China
Keyword: YOLOv3-SPP3-tiny, Complex scenes, Target detection, Model
pruning.
Abstract. Obstacle detection in complex urban traffic environment has
become an important part of unmanned vehicle optimization, and its
complexity brings great challenges to the reliability of unmanned target
detection. YOLOv3 in deep learning algorithm has a good detection effect
in target detection, but it has certain defects in detecting targets in complex
urban traffic environment. In this paper, the spatial pyramid module is
added to YOLOv3 to improve the extraction of data features of the deep
model. Then, on the basis of optimized network, the target detection
algorithm is streamlined by combining layer pruning and channel pruning.
The streamlined algorithm is called YOLOv3-SPP3-Tiny. Comparing the
experimental results of YOLOv3-SPP3-tiny and YOLOv3 on Street Scenes
dataset, the Precision is improved by 2.77%, the average precision (mAP)
is increased by 0.87%, the Total BFLOPS is reduced by 94.49%, and the
Inference time is reduced by 80.39%. Experimental results show that the
model YOLOv3-SPP3-tiny algorithm is more conducive to unmanned
object detection in complex urban road environment.
1 Introduction
Target detection algorithm is one of the most basic research fields of driverless cars.
With the rise of deep learning, a large number of target detection algorithms are applied to
improve the detection accuracy. As a difficult point in the study of target detection, urban
traffic environment is also an attractive research point. Especially for driverless vehicles,
computer vision is one of its main research directions. Improving the accuracy and speed of
target detection in urban traffic environment is crucial to reduce the accident rate.
Traditional target detection algorithm on the precision and speed can achieve a satisfactory
result, based on the deep learning algorithms in machine vision task (including target
detection, classification and tracking) [1-6] made a major breakthrough, the machine vision
approach from the target image parsing out the computer can understand information,
machine vision the cognition of image mainly in four aspects: 1.Classification mainly
analyzes the categories of the target;2.Positioning To determine the location of the target;3.
Detection is the combination of classification and positioning to determine the object class
*
Corresponding author:
© The Authors, published by EDP Sciences. This is an open access article distributed under the terms of the Creative Commons
Attribution License 4.0 (http://creativecommons.org/licenses/by/4.0/).
MATEC Web of Conferences 336, 06001 (2021)
CSCNS2020
https://doi.org/10.1051/matecconf/202133606001
and its position;4.There are two types of segmentation main one is the semantic
segmentation category is the instance segmentation, semantic segmentation will each pixel
point in the image annotation for some categories of objects, while another instance
segmentation for the combination of semantic segmentation and target detection, marked
images are similar other different forms, but the machine with human perception or there is
a big gap. Machine vision will be blocked by objects, low resolution, and bad weather,
which will reduce the detection effect. In addition, the complexity of urban traffic
environment from motor vehicles, pedestrians and non-motor vehicles to buildings, trees
and sidewalks also leads to missed detection and false detection of target detection
algorithm. In the actual detection of driverless vehicles in the face of complex traffic
environment, there are mainly the following challenges:
(1) Influence of light: Most vehicles have the characteristics of high reflected light in
strong light, and the detection effect in weak light will also reduce the feature extraction
effect of target detection.
(2) Complex and diverse types: The complex types of vehicles and the changeable
clothing of road pedestrians in the urban traffic environment have great demands on the
performance of target detection.
(3) Target occlusion: Due to the large number of vehicles in urban traffic, the complex
traffic flow and the changeable movement characteristics of road pedestrians make it
difficult to timely capture effective feature information due to the sudden appearance of
objects in the process of man-car, car-car intersection or turning.
(4) Background interference: Complex urban traffic contains a large number of
billboards, various types of shop fronts, cloudy sky and other complex background
information, which is a difficult problem to obtain target information.
In order to overcome the above obstacles, Street Scenes of urban traffic data sets
containing nine object categories were first selected in this work. Secondly, an urban traffic
detection model based on YOLOv3[7-9] is optimized. On the framework of YOLOv3, the
spatial pyramid structure was added to the three detection heads to enhance the deep feature
extraction effect, and then pruning strategy [10-12] was developed to reduce the
redundancy of the network model and improve the detection efficiency. In this paper,
YOLOv3-SPP3-Tiny is proposed based on the detection problem in urban traffic
environment to improve the target detection of driverless vehicles in urban traffic.
2 Related work
2.1 Network optimization
After continuous improvement, YOLOv3 algorithm of high precision was optimized by
YOLO [7-9] series algorithm. YOLOv3 algorithm was developed on Darknet, a light
learning framework. It has the advantage of high speed and can make full use of multi-core
processor and GPU parallel operation.YOLOv3 [7-9] in order to make the network model
to achieve better detection effect, on the basis of the original to add a lot of good
performance of the convolution layer of 3×3 and 1 × 1, at the same time enhanced
shortcut links in a network, the number of structure of the improved algorithm of target
detection YOLOv3 precision than YOLO [7-9] series of other target detection precision of
the algorithm is much higher. However, in the complex urban traffic environment, the
target detection of YOLOv3 will be subject to object occlusion, multi-motion state and bad
weather, which will reduce the detection effect. In this paper, based on the YOLOv3
algorithm, a space pyramid module [13] (SPP) is added to improve the deep features. The
space pyramid module uses proportional pooling, which can output the features of fixed
2
MATEC Web of Conferences 336, 06001 (2021)
CSCNS2020
https://doi.org/10.1051/matecconf/202133606001
dimensions without considering the size of the extracted feature map, mainly replacing the
full connection layer. The improvement point is to divide the image into several parts (...truncated)