Efficient Pomegranate Segmentation with UNet: A Comparative Analysis of Backbone Architectures and Knowledge Distillation
ITM Web of Conferences 54, 01001 (2023)
I3CS-2023
https://doi.org/10.1051/itmconf/20235401001
Efficient Pomegranate Segmentation with UNet: A Comparative Analysis of Backbone Architectures and Knowledge
Distillation
Shubham Mane1,∗ , Prashant Bartakke1,∗∗ , and Tulshidas Bastewad2,∗∗∗
1
Department of Electronics and Telecommunication Engineering, COEP, Technological University,
India-411005
2
Department of Farm Implements and Machinery, Mahatma Phule Krishi Vidyapeeth, Rahuri-413722
Abstract. This work examines the segmentation of on-field images of
pomegranate fruit using UNet model with different backbones. Precise and effective segmentation of pomegranate fruits on the field is essential for automating yield estimation, disease detection, and quality evaluation in the agricultural
industry. The models have been trained and validated using actual images captured in a pomegranate field. The study assesses the performance of many backbones, including ResNet50, Inception ResNetV2, MobileNetv2, DenseNet121,
EfficientNet, VGG16, and VGG19. The VGG19 backbone achieved the highest
F1 score, 90.35 %, according to the data. In addition, we employed featurebased knowledge distillation to move the knowledge from the VGG19 backbone to the lighter MobileNetv2 backbone (45x smaller than VGG19 in number of parameters), which increased the F1 score of MobileNetv2 from 86.97%
to 89.91%. Our findings show that the effectiveness of the UNet model for
pomegranate fruit segmentation is greatly impacted by the selection of the backbone architecture, and that knowledge distillation can improve the accuracy of
UNet models with lighter backbones without significantly increasing their computational complexity.
1 Introduction
The pomegranate is a widely cultivated fruit that has a high nutritional content and is valued
for its abilities to improve health. Automated systems based on computer vision techniques
have been developed for pomegranate yield prediction in response to the recent increase in
demand for pomegranates and associated goods, disease diagnosis, and quality assessment.
Segmentation by hand takes longer and is more prone to mistakes. The division of an image
into different parts or segments based on their visual characteristics, such as colour, texture,
or shape, is a process known as segmentation. It enables the extraction of relevant information and features from images, which is a crucial step in image analysis and computer
vision. In the context of pomegranate fruit detection, segmentation can be utilised to separate the pomegranate fruit from other objects and the background [1]. In the agriculture
∗ e-mail:
∗∗ e-mail:
∗∗∗ e-mail:
© The Authors, published by EDP Sciences. This is an open access article distributed under the terms of the Creative Commons
Attribution License 4.0 (https://creativecommons.org/licenses/by/4.0/).
ITM Web of Conferences 54, 01001 (2023)
I3CS-2023
https://doi.org/10.1051/itmconf/20235401001
industry, precise segmentation of poemgranate fruit is an essential preprocessing step for
various applications such as on-field disease detection, yield estimation, and quality assessment. The complex and diverse background can negatively impact the performance of these
applications. Accurate segmentation is necessary to eliminate unwanted features that may
confuse the model, ensuring reliable results in real-time scenarios by properly isolating the
background from the image.
On-field pomegranate fruit images are difficult to segment because of a variety of issues,
including their irregular shape, occlusions, variations in size and color, and lighting conditions [2]. To overcome these obstacles, it is necessary to create reliable computer vision
algorithms that can account for these elements and segment pomegranate fruits accurately.
Deep learning models recognize and learn from data patterns using artificial neural networks. These models excel at image segmentation, which divides an image into regions
based on its content. Deep learning models can accurately segment images by features. UNet
models [3] are popular for image segmentation.The image segmentation architecture comprises of two modules namely encoder and decoder. UNet is widely used in medical industry
for medical image segmentation. Its performance on images like pomegranate fruit has not
been thoroughly investigated. Thus, the UNet model’s pomegranate fruit segmentation performance must be evaluated.
To attain high accuracy, deep learning models need a lot of data, and gathering this data
for the segmentation of pomegranate fruit might be challenging. Deep learning models are
also expensive to compute, which may restrict their usefulness. Deploying such models on
devices with limited resources can also be difficult. To overcome these difficulties, we suggest
using knowledge distillation [4] to move information from a larger model to a smaller one,
increasing the model’s efficiency while preserving its accuracy.
We have made a number of contributions to the field of deep learning-based pomegranate
fruit segmentation in this study. First, we developed a dataset of real-world images of
pomegranate fruit and used data augmentation methods to boost its diversity. The effectiveness of UNet [3] models with various backbone architectures was assessed, and we showed
how the choice of backbone architecture affected the model’s performance. Third, we have
demonstrated that feature-based knowledge distillation is capable of helping transfer knowledge from larger, more complex models to smaller, simpler models. This has allowed us
to obtain nearly state-of-the-art performance with substantially lower computing costs. Last
but not least, our research sheds light on the useful aspects for developing and testing deep
learning models for image segmentation tasks. Overall, our work can contribute to the improvement of deep learning models for tasks like pomegranate fruit segmentation as well as
additional agricultural image analysis tasks.
The remaining sections of this work are organized as follows: Work related to Section
2, Part 3 describes the study’s methodology, including data preparation, model design, and
performance assessment measures. Section 4 contains the experimental findings, followed by
Section 5’s discussion and conclusion. Section 6 is reserved for future work.
2 Related work
A semantic segmentation network’s performance depends on its classification network.
VGG-16, ResNet34, ResNet101, and Xception were compared as UNet, ResNet34, and ADLinkNet backbones using the CVPR DeepGlobe road extraction dataset. VGG-16 extracted
long and wide roads better than ResNet, but ResNet extracted small roads better. Xception, however, handled complex occlusion situations while retaining ResNet34’s features [5].
Gómez-Flores et. al. have evaluated four CNN-based semantic segmentation models developed by the computer vision community: Fully Convolutional Network (FCN) with AlexNet,
2
ITM Web of Con (...truncated)