A novel optimized tiny YOLOv3 algorithm for the identification of objects in the lawn environment
www.nature.com/scientificreports
OPEN
A novel optimized tiny YOLOv3
algorithm for the identification
of objects in the lawn environment
Xinyan Wang*, Feng Lv, Lei Li, Zhengyang Yi & Quan Jiang
Based on the problem of insufficient accuracy of the original tiny YOLOv3 algorithm for object
detection in a lawn environment, an Optimized tiny YOLOv3 algorithm with less computation and
higher accuracy is proposed. Three reasons affect the accuracy of the original tiny YOLOv3 algorithm
for detecting objects in a lawn environment. First, the backbone of the original algorithm is composed
of a stack of a single convolutional layer and a max-pooling layer, which results in insufficient ability
to extract feature information of objects. An enhancement module is proposed to enhance the feature
extraction capability of the shallow layers of the network. Second, the information of the shallow
convolutional layers of the backbone is not fully used, which results in insufficient detection capability
for small objects. Third, the deep part of the backbone uses a convolutional layer with an excessive
number of channels, which results in a large amount of computation. A multi-resolution fusion
module is proposed to enhance the information interaction capability of the deep and shallow layers
of the network, and reduce the computation. To verify the accuracy of this Optimized tiny YOLOv3
algorithm, the algorithm was tested on the dataset containing trunk, spherical tree and person,
and compared with the current research. The results show that the algorithm proposed in this paper
improves the detection accuracy while reducing the calculation.
With the construction of the ecological environment, the lawn industry is developing rapidly, and the application of lawn machinery is becoming more and more extensive. The lightweight deep object detection technology
based on deep learning can allow lawn machinery to automatically identify objects on the lawn, which can be
used for obstacle avoidance or tree health etection to improve the degree of intelligence. Therefore, it is of great
significance to the application of computer vision technology in the lawn environment.
With the rapid development of deep learning techniques based on convolutional neural n
etworks1–3, significant achievements have been made in the field of object detection. Generally, object detection algorithms can be
classified into two types: two-stage and one-stage algorithms. The first stage of the two-stage algorithm involves
extracting the depth features of the image through a RPN (region proposal network) to generate candidate
regions. The second stage selects the candidates through the object detection network. Subsequently, the regions
are classified and located. The algorithms of this type exhibit a high accuracy rate, but the detection speed is slow;
examples of two-stage algorithms include R-CNN4 (region convolutional neural networks), fast R-CNN5, faster
R-CNN6 and SPP7. The slow detection rate is attributed to the two stages in the algorithm.
Conversely, the one-stage algorithm directly predicts the category and position of the object through the
backbone network without the RPN network. Although the detection accuracy is reduced, the detection speed
is significantly improved. Commonly used algorithms are the S SD8,9 (single shot multibox detector) series and
YOLO10–12 (you only look once) series. The YOLO algorithm solves the object detection task as a regression
problem. It can directly process the input image and output the result. At present, it is one of the fastest detection
network architectures. However, the aforementioned algorithms rely on high-performance computing hardware.
Due to the limited computing ability of embedded devices, it is impossible to deploy the aforementioned large
neural networks.
Since the efficient execution of deep neural networks on mobile devices has become part of mainstream
research13–15, the YOLO series is being developed further; a tiny YOLO series version that is more suitable for
deployment on mobile devices has been designed. Tiny YOLOv3 reduces a large number of convolutional layers in the YOLO series and reduces the size of the network, thereby reducing the hardware computing power
requirements and increasing the detection speed. However, it reduces the detection accuracy and results in a
high missing detection rate. Z
hang16 improved the ability to detect pedestrians by merely adding convolutional
School of Mechanical Engineering, Jiangsu University of Science and Technology, Zhenjiang 212003, China. *email:
Scientific Reports |
(2022) 12:15124
| https://doi.org/10.1038/s41598-022-19519-4
1
Vol.:(0123456789)
www.nature.com/scientificreports/
Figure 1. Theory of CIoU loss regression of bounding box.
layers to the tiny YOLOv3 network. Gai17 proposed the improved tiny YOLOv3 for real-time object detection by
adding convolutional layers. H
e18 proposed the TF-YOLO to increase the detection accuracy of the tiny YOLOv3
network by adding one YOLO layer. Wu19 proposed a light YOLOv3 network to detect apples using a residual
block composed of depthwise separable convolutions. Liu20 increased the detection accuracy of the tiny YOLOv3
network by adding one YOLO layer.
The original tiny YOLOv3 algorithm has low detection accuracy on our lawn environment target dataset.
(1) We design an enhancement module to improve the detection accuracy of backbone. (2) We design a multiresolution fusion module to enhance the information interaction capability inside the backbone and reduce
the amount of calculation. (3) On this basis, the Optimized tiny YOLOv3 algorithm is proposed. Comparison
experiment with the current three lightweight YOLO algorithms shows that the algorithm proposed in this paper
is superior to the others in terms of accuracy and lightweight degree.
Optimized tiny YOLOv3 algorithm network
Loss function. To improve the speed and convergence of the bounding box regression, the CIoU21 (com-
plete intersection over union) loss function is adopted as the loss function,
LCIoU = 1 − IoU + RCIoU .
(1)
ρ 2 b, bgt
+ αv,
=
f2
(2)
v
,
(1 − IoU) + v
(3)
2
4
w gt
w
arctan
−
arctan
,
π2
hgt
h
(4)
RCIoU is the
RCIoU
α=
v=
where, α is the weight function; v is the similarity parameter of the aspect ratio; b and bgt represent the center
points of the bounding box and the ground truth, respectively; ρ is the Euclidean distance between the two
center points; f is the diagonal length of the smallest rectangle that can contain both the bounding box and the
ground truth, as shown in Fig. 1.
Enhancement module. BFLOPs (billion float operations per seconds) is the number of floating-point
operations of the convolutional neural network, and it is a parameter to measure amount of calculation. The
calculation equation is as follows,
Calculation =
2HWKh Kw Cin Cout
,
109
(5)
where, H, W and Cout are the width, height and number of channels of the output feature map (...truncated)