Efficient Pomegranate Segmentation with UNet: A Comparative Analysis of Backbone Architectures and Knowledge Distillation (pdf)

Article PDF cannot be displayed. You can download it here:

https://www.itm-conferences.org/articles/itmconf/pdf/2023/04/itmconf_I3cs2023_01001.pdf

Efficient Pomegranate Segmentation with UNet: A Comparative Analysis of Backbone Architectures and Knowledge Distillation

ITM Web of Conferences 54, 01001 (2023) I3CS-2023 https://doi.org/10.1051/itmconf/20235401001 Efficient Pomegranate Segmentation with UNet: A Comparative Analysis of Backbone Architectures and Knowledge Distillation Shubham Mane1,∗ , Prashant Bartakke1,∗∗ , and Tulshidas Bastewad2,∗∗∗ 1 Department of Electronics and Telecommunication Engineering, COEP, Technological University, India-411005 2 Department of Farm Implements and Machinery, Mahatma Phule Krishi Vidyapeeth, Rahuri-413722 Abstract. This work examines the segmentation of on-field images of pomegranate fruit using UNet model with different backbones. Precise and effective segmentation of pomegranate fruits on the field is essential for automating yield estimation, disease detection, and quality evaluation in the agricultural industry. The models have been trained and validated using actual images captured in a pomegranate field. The study assesses the performance of many backbones, including ResNet50, Inception ResNetV2, MobileNetv2, DenseNet121, EfficientNet, VGG16, and VGG19. The VGG19 backbone achieved the highest F1 score, 90.35 %, according to the data. In addition, we employed featurebased knowledge distillation to move the knowledge from the VGG19 backbone to the lighter MobileNetv2 backbone (45x smaller than VGG19 in number of parameters), which increased the F1 score of MobileNetv2 from 86.97% to 89.91%. Our findings show that the effectiveness of the UNet model for pomegranate fruit segmentation is greatly impacted by the selection of the backbone architecture, and that knowledge distillation can improve the accuracy of UNet models with lighter backbones without significantly increasing their computational complexity. 1 Introduction The pomegranate is a widely cultivated fruit that has a high nutritional content and is valued for its abilities to improve health. Automated systems based on computer vision techniques have been developed for pomegranate yield prediction in response to the recent increase in demand for pomegranates and associated goods, disease diagnosis, and quality assessment. Segmentation by hand takes longer and is more prone to mistakes. The division of an image into different parts or segments based on their visual characteristics, such as colour, texture, or shape, is a process known as segmentation. It enables the extraction of relevant information and features from images, which is a crucial step in image analysis and computer vision. In the context of pomegranate fruit detection, segmentation can be utilised to separate the pomegranate fruit from other objects and the background [1]. In the agriculture ∗ e-mail: ∗∗ e-mail: ∗∗∗ e-mail: © The Authors, published by EDP Sciences. This is an open access article distributed under the terms of the Creative Commons Attribution License 4.0 (https://creativecommons.org/licenses/by/4.0/). ITM Web of Conferences 54, 01001 (2023) I3CS-2023 https://doi.org/10.1051/itmconf/20235401001 industry, precise segmentation of poemgranate fruit is an essential preprocessing step for various applications such as on-field disease detection, yield estimation, and quality assessment. The complex and diverse background can negatively impact the performance of these applications. Accurate segmentation is necessary to eliminate unwanted features that may confuse the model, ensuring reliable results in real-time scenarios by properly isolating the background from the image. On-field pomegranate fruit images are difficult to segment because of a variety of issues, including their irregular shape, occlusions, variations in size and color, and lighting conditions [2]. To overcome these obstacles, it is necessary to create reliable computer vision algorithms that can account for these elements and segment pomegranate fruits accurately. Deep learning models recognize and learn from data patterns using artificial neural networks. These models excel at image segmentation, which divides an image into regions based on its content. Deep learning models can accurately segment images by features. UNet models [3] are popular for image segmentation.The image segmentation architecture comprises of two modules namely encoder and decoder. UNet is widely used in medical industry for medical image segmentation. Its performance on images like pomegranate fruit has not been thoroughly investigated. Thus, the UNet model’s pomegranate fruit segmentation performance must be evaluated. To attain high accuracy, deep learning models need a lot of data, and gathering this data for the segmentation of pomegranate fruit might be challenging. Deep learning models are also expensive to compute, which may restrict their usefulness. Deploying such models on devices with limited resources can also be difficult. To overcome these difficulties, we suggest using knowledge distillation [4] to move information from a larger model to a smaller one, increasing the model’s efficiency while preserving its accuracy. We have made a number of contributions to the field of deep learning-based pomegranate fruit segmentation in this study. First, we developed a dataset of real-world images of pomegranate fruit and used data augmentation methods to boost its diversity. The effectiveness of UNet [3] models with various backbone architectures was assessed, and we showed how the choice of backbone architecture affected the model’s performance. Third, we have demonstrated that feature-based knowledge distillation is capable of helping transfer knowledge from larger, more complex models to smaller, simpler models. This has allowed us to obtain nearly state-of-the-art performance with substantially lower computing costs. Last but not least, our research sheds light on the useful aspects for developing and testing deep learning models for image segmentation tasks. Overall, our work can contribute to the improvement of deep learning models for tasks like pomegranate fruit segmentation as well as additional agricultural image analysis tasks. The remaining sections of this work are organized as follows: Work related to Section 2, Part 3 describes the study’s methodology, including data preparation, model design, and performance assessment measures. Section 4 contains the experimental findings, followed by Section 5’s discussion and conclusion. Section 6 is reserved for future work. 2 Related work A semantic segmentation network’s performance depends on its classification network. VGG-16, ResNet34, ResNet101, and Xception were compared as UNet, ResNet34, and ADLinkNet backbones using the CVPR DeepGlobe road extraction dataset. VGG-16 extracted long and wide roads better than ResNet, but ResNet extracted small roads better. Xception, however, handled complex occlusion situations while retaining ResNet34’s features [5]. Gómez-Flores et. al. have evaluated four CNN-based semantic segmentation models developed by the computer vision community: Fully Convolutional Network (FCN) with AlexNet, 2 ITM Web of Con (...truncated)