A novel optimized tiny YOLOv3 algorithm for the identification of objects in the lawn environment

Scientific Reports, Sep 2022

Based on the problem of insufficient accuracy of the original tiny YOLOv3 algorithm for object detection in a lawn environment, an Optimized tiny YOLOv3 algorithm with less computation and higher accuracy is proposed. Three reasons affect the accuracy of the original tiny YOLOv3 algorithm for detecting objects in a lawn environment. First, the backbone of the original algorithm is composed of a stack of a single convolutional layer and a max-pooling layer, which results in insufficient ability to extract feature information of objects. An enhancement module is proposed to enhance the feature extraction capability of the shallow layers of the network. Second, the information of the shallow convolutional layers of the backbone is not fully used, which results in insufficient detection capability for small objects. Third, the deep part of the backbone uses a convolutional layer with an excessive number of channels, which results in a large amount of computation. A multi-resolution fusion module is proposed to enhance the information interaction capability of the deep and shallow layers of the network, and reduce the computation. To verify the accuracy of this Optimized tiny YOLOv3 algorithm, the algorithm was tested on the dataset containing trunk, spherical tree and person, and compared with the current research. The results show that the algorithm proposed in this paper improves the detection accuracy while reducing the calculation.

Article PDF cannot be displayed. You can download it here:

https://www.nature.com/articles/s41598-022-19519-4.pdf

A novel optimized tiny YOLOv3 algorithm for the identification of objects in the lawn environment

www.nature.com/scientificreports OPEN A novel optimized tiny YOLOv3 algorithm for the identification of objects in the lawn environment Xinyan Wang*, Feng Lv, Lei Li, Zhengyang Yi & Quan Jiang Based on the problem of insufficient accuracy of the original tiny YOLOv3 algorithm for object detection in a lawn environment, an Optimized tiny YOLOv3 algorithm with less computation and higher accuracy is proposed. Three reasons affect the accuracy of the original tiny YOLOv3 algorithm for detecting objects in a lawn environment. First, the backbone of the original algorithm is composed of a stack of a single convolutional layer and a max-pooling layer, which results in insufficient ability to extract feature information of objects. An enhancement module is proposed to enhance the feature extraction capability of the shallow layers of the network. Second, the information of the shallow convolutional layers of the backbone is not fully used, which results in insufficient detection capability for small objects. Third, the deep part of the backbone uses a convolutional layer with an excessive number of channels, which results in a large amount of computation. A multi-resolution fusion module is proposed to enhance the information interaction capability of the deep and shallow layers of the network, and reduce the computation. To verify the accuracy of this Optimized tiny YOLOv3 algorithm, the algorithm was tested on the dataset containing trunk, spherical tree and person, and compared with the current research. The results show that the algorithm proposed in this paper improves the detection accuracy while reducing the calculation. With the construction of the ecological environment, the lawn industry is developing rapidly, and the application of lawn machinery is becoming more and more extensive. The lightweight deep object detection technology based on deep learning can allow lawn machinery to automatically identify objects on the lawn, which can be used for obstacle avoidance or tree health etection to improve the degree of intelligence. Therefore, it is of great significance to the application of computer vision technology in the lawn environment. With the rapid development of deep learning techniques based on convolutional neural n etworks1–3, significant achievements have been made in the field of object detection. Generally, object detection algorithms can be classified into two types: two-stage and one-stage algorithms. The first stage of the two-stage algorithm involves extracting the depth features of the image through a RPN (region proposal network) to generate candidate regions. The second stage selects the candidates through the object detection network. Subsequently, the regions are classified and located. The algorithms of this type exhibit a high accuracy rate, but the detection speed is slow; examples of two-stage algorithms include R-CNN4 (region convolutional neural networks), fast R-CNN5, faster R-CNN6 and SPP7. The slow detection rate is attributed to the two stages in the algorithm. Conversely, the one-stage algorithm directly predicts the category and position of the object through the backbone network without the RPN network. Although the detection accuracy is reduced, the detection speed is significantly improved. Commonly used algorithms are the S SD8,9 (single shot multibox detector) series and YOLO10–12 (you only look once) series. The YOLO algorithm solves the object detection task as a regression problem. It can directly process the input image and output the result. At present, it is one of the fastest detection network architectures. However, the aforementioned algorithms rely on high-performance computing hardware. Due to the limited computing ability of embedded devices, it is impossible to deploy the aforementioned large neural networks. Since the efficient execution of deep neural networks on mobile devices has become part of mainstream research13–15, the YOLO series is being developed further; a tiny YOLO series version that is more suitable for deployment on mobile devices has been designed. Tiny YOLOv3 reduces a large number of convolutional layers in the YOLO series and reduces the size of the network, thereby reducing the hardware computing power requirements and increasing the detection speed. However, it reduces the detection accuracy and results in a high missing detection rate. Z hang16 improved the ability to detect pedestrians by merely adding convolutional School of Mechanical Engineering, Jiangsu University of Science and Technology, Zhenjiang 212003, China. *email: Scientific Reports | (2022) 12:15124 | https://doi.org/10.1038/s41598-022-19519-4 1 Vol.:(0123456789) www.nature.com/scientificreports/ Figure 1.  Theory of CIoU loss regression of bounding box. layers to the tiny YOLOv3 network. Gai17 proposed the improved tiny YOLOv3 for real-time object detection by adding convolutional layers. H e18 proposed the TF-YOLO to increase the detection accuracy of the tiny YOLOv3 network by adding one YOLO layer. Wu19 proposed a light YOLOv3 network to detect apples using a residual block composed of depthwise separable convolutions. Liu20 increased the detection accuracy of the tiny YOLOv3 network by adding one YOLO layer. The original tiny YOLOv3 algorithm has low detection accuracy on our lawn environment target dataset. (1) We design an enhancement module to improve the detection accuracy of backbone. (2) We design a multiresolution fusion module to enhance the information interaction capability inside the backbone and reduce the amount of calculation. (3) On this basis, the Optimized tiny YOLOv3 algorithm is proposed. Comparison experiment with the current three lightweight YOLO algorithms shows that the algorithm proposed in this paper is superior to the others in terms of accuracy and lightweight degree. Optimized tiny YOLOv3 algorithm network Loss function. To improve the speed and convergence of the bounding box regression, the CIoU21 (com- plete intersection over union) loss function is adopted as the loss function, LCIoU = 1 − IoU + RCIoU . (1)   ρ 2 b, bgt + αv, = f2 (2) v , (1 − IoU) + v (3) 2  4 w gt w arctan − arctan , π2 hgt h (4) RCIoU is the RCIoU α= v= where, α is the weight function; v is the similarity parameter of the aspect ratio; b and bgt represent the center points of the bounding box and the ground truth, respectively; ρ is the Euclidean distance between the two center points; f is the diagonal length of the smallest rectangle that can contain both the bounding box and the ground truth, as shown in Fig. 1. Enhancement module. BFLOPs (billion float operations per seconds) is the number of floating-point operations of the convolutional neural network, and it is a parameter to measure amount of calculation. The calculation equation is as follows, Calculation = 2HWKh Kw Cin Cout , 109 (5) where, H, W and Cout are the width, height and number of channels of the output feature map (...truncated)


This is a preview of a remote PDF: https://www.nature.com/articles/s41598-022-19519-4.pdf
Article home page: https://www.nature.com/articles/s41598-022-19519-4

Wang, Xinyan, Lv, Feng, Li, Lei, Yi, Zhengyang, Jiang, Quan. A novel optimized tiny YOLOv3 algorithm for the identification of objects in the lawn environment, Scientific Reports, DOI: 10.1038/s41598-022-19519-4