Fast detection of bag-breakups in pulsating and steady airflow using video analysis and deep learning
Journal of Real-Time Image Processing
(2023) 20:114
https://doi.org/10.1007/s11554-023-01363-y
RESEARCH
Fast detection of bag‑breakups in pulsating and steady airflow using
video analysis and deep learning
Daiki Morita1 · Bisser Raytchev1 · Abdussalam Elhanashi2 · Mikimasa Kawaguchi1 · Yoichi Ogata1 · Toru Higaki1 ·
Kazufumi Kaneda1 · Akira Nakashima3 · Sergio Saponara2
Received: 21 January 2023 / Accepted: 12 September 2023
© The Author(s) 2023
Abstract
Object detection methods based on deep learning have made great progress in recent years and have been used successfully
in many different applications. However, since they have been evaluated predominantly on datasets of natural images, it is
still unclear how accurate and effective they can be if used in special domain applications, for example in scientific, industrial, etc. images, where the properties of the images are very different from those taken in natural scenes. In this study, we
illustrate the challenges one needs to face in such a setting on a concrete practical application, involving the detection of a
particular fluid phenomenon—bag-breakup—in images of droplet scattering, which differ significantly from natural images.
Using two technologically mature and state-of-the-art object detection methods, RetinaNet and YOLOv7, we discuss what
strategies need to be considered in this problem setting, and perform both quantitative and qualitative evaluations to study
their effects. Additionally, we also propose a new method to further improve accuracy of detection by utilizing information
from several consecutive frames. We hope that the practical insights gained in this study can be of use to other researchers
and practitioners when targeting applications where the images differ greatly from natural images.
Keywords Object detection · Scientific and industrial applications · Real-time processing · Small-size datasets · YOLOv7 ·
RetinaNet
1 Introduction
Object detection is one of the major tasks in computer
vision and recently, various deep learning-based models
with improved accuracy and detection speed have been proposed (see [3, 6, 7, 12] for recent comprehensive surveys
on object detection). However, in practice, there are various
problems in applying deep learning-based object detection
models, especially in special domains, where images represent specific scientific phenomena or are taken in industrial
settings, etc., and it is generally difficult to secure a large
amount of annotated data for such applications. The reason
* Bisser Raytchev
1
Graduate School of Advanced Science and Engineering,
Hiroshima University, Higashihiroshima, Japan
2
Dip. Ingegneria Informazione, University of Pisa, Pisa, Italy
3
MBD Innovation Department, Mazda Motor Corporation,
Fuchu, Japan
for this is that in such cases (in contrast with natural scene
images), usually, only experts can provide the annotations,
which makes the annotation process very costly. Therefore,
in such cases, securing large annotated datasets for training
the data-hungry deep learning models is either impossible or
impractical. Additionally, often the presence of rare classes
of objects, or phenomena to be detected, leads to the class
imbalance problem, which can make the learning process
difficult and unreliable.
Furthermore, it is a common practice that newly developed and state-of-the-art object detection models are mainly
evaluated using natural image-based datasets such as the
PASCAL VOC (PASCAL Visual Object Classes Challenge)
dataset [5] and the COCO (Microsoft Common Objects in
Context) dataset [11]. Since the images in these datasets (and
the features extracted from them) would differ significantly
from those targeted in general scientific/industrial applications, it is unclear how accurate and effective the methods
are, and which of their components are critical for the success of the targeted application. Sometimes, the practitioners
who need to provide the solutions (using appropriate object
13
Vol.:(0123456789)
114
Page 2 of 12
detection methods) are scientists or other staff which are
not machine learning experts, and for such people, it can
be very difficult to orientate themselves in the extremely
rapidly evolving landscape of object detection algorithms,
very often leaving them without a clue which method should
they use and which strategies should they follow to obtain
the best possible performance on their application.
Therefore, in this study, we apply two representative
object detection models to a small-size dataset from a
domain that differs significantly from natural images, and
examine their detection accuracy and the effects of the
models’ components on accuracy. The dataset used in this
study consists of basic experimental images representing a
phenomenon called droplet dispersion, which occurs inside
an automobile exhaust pipe. The dataset consists of a small
number of about 800 images, and the target of detection is a
form of droplet dispersal called bag-breakup. While in the
case of natural scene images usually there can be dozens of
detection targets in a single image, or it is easy to obtain a
large number of images with the targets of interest (like cats,
or cars, etc.), our dataset has typically one and maximum
two detection targets in a single image, while we can have
many thousands of images collected from the experiments,
which do not contain any detection target at all. Additionally,
what makes the droplet dispersion images more challenging
is that the detection targets (bag-breakups in our task) are
very similar to the background (in terms of visual appearance and texture patterns); while in natural scene images, it
is much more easier to discriminate between, for example,
cats and their background. Finally, in test mode, the trained
model has to process an enormous volume of experimental data/images generated daily and input as a video stream
to the system, therefore real-time processing abilities are
crucial.
To address these issues, and provide guidance to practitioners who might have to deal with similar challenges, we
consider a set of solutions and architecture choices suitable
for such a setting and investigate their effectiveness. The
rest of this paper is organized as follows. First, in the next
section, we describe the concrete task which motivates our
research. In Sect. 3, we discuss the most common problems
and challenges which practitioners need to deal with when
applying object detection methods in special domains and
show how these can be overcome using corresponding practices and architecture decisions developed in recent state-ofthe-art methods in the object detection field. Additionally, in
Sect. 3.4 we propose a new method, which by utilizing information from several consecutive frames further improves
accuracy of detection by eliminating false positives which
might occur in regions which closely resemble the visual
structure of bag-breakup patterns and are difficult even
for ex (...truncated)