Fast detection of bag-breakups in pulsating and steady airflow using video analysis and deep learning (pdf)

Article PDF cannot be displayed. You can download it here:

https://link.springer.com/content/pdf/10.1007/s11554-023-01363-y.pdf

Fast detection of bag-breakups in pulsating and steady airflow using video analysis and deep learning

Journal of Real-Time Image Processing (2023) 20:114 https://doi.org/10.1007/s11554-023-01363-y RESEARCH Fast detection of bag‑breakups in pulsating and steady airflow using video analysis and deep learning Daiki Morita1 · Bisser Raytchev1 · Abdussalam Elhanashi2 · Mikimasa Kawaguchi1 · Yoichi Ogata1 · Toru Higaki1 · Kazufumi Kaneda1 · Akira Nakashima3 · Sergio Saponara2 Received: 21 January 2023 / Accepted: 12 September 2023 © The Author(s) 2023 Abstract Object detection methods based on deep learning have made great progress in recent years and have been used successfully in many different applications. However, since they have been evaluated predominantly on datasets of natural images, it is still unclear how accurate and effective they can be if used in special domain applications, for example in scientific, industrial, etc. images, where the properties of the images are very different from those taken in natural scenes. In this study, we illustrate the challenges one needs to face in such a setting on a concrete practical application, involving the detection of a particular fluid phenomenon—bag-breakup—in images of droplet scattering, which differ significantly from natural images. Using two technologically mature and state-of-the-art object detection methods, RetinaNet and YOLOv7, we discuss what strategies need to be considered in this problem setting, and perform both quantitative and qualitative evaluations to study their effects. Additionally, we also propose a new method to further improve accuracy of detection by utilizing information from several consecutive frames. We hope that the practical insights gained in this study can be of use to other researchers and practitioners when targeting applications where the images differ greatly from natural images. Keywords Object detection · Scientific and industrial applications · Real-time processing · Small-size datasets · YOLOv7 · RetinaNet 1 Introduction Object detection is one of the major tasks in computer vision and recently, various deep learning-based models with improved accuracy and detection speed have been proposed (see [3, 6, 7, 12] for recent comprehensive surveys on object detection). However, in practice, there are various problems in applying deep learning-based object detection models, especially in special domains, where images represent specific scientific phenomena or are taken in industrial settings, etc., and it is generally difficult to secure a large amount of annotated data for such applications. The reason * Bisser Raytchev 1 Graduate School of Advanced Science and Engineering, Hiroshima University, Higashihiroshima, Japan 2 Dip. Ingegneria Informazione, University of Pisa, Pisa, Italy 3 MBD Innovation Department, Mazda Motor Corporation, Fuchu, Japan for this is that in such cases (in contrast with natural scene images), usually, only experts can provide the annotations, which makes the annotation process very costly. Therefore, in such cases, securing large annotated datasets for training the data-hungry deep learning models is either impossible or impractical. Additionally, often the presence of rare classes of objects, or phenomena to be detected, leads to the class imbalance problem, which can make the learning process difficult and unreliable. Furthermore, it is a common practice that newly developed and state-of-the-art object detection models are mainly evaluated using natural image-based datasets such as the PASCAL VOC (PASCAL Visual Object Classes Challenge) dataset [5] and the COCO (Microsoft Common Objects in Context) dataset [11]. Since the images in these datasets (and the features extracted from them) would differ significantly from those targeted in general scientific/industrial applications, it is unclear how accurate and effective the methods are, and which of their components are critical for the success of the targeted application. Sometimes, the practitioners who need to provide the solutions (using appropriate object 13 Vol.:(0123456789) 114 Page 2 of 12 detection methods) are scientists or other staff which are not machine learning experts, and for such people, it can be very difficult to orientate themselves in the extremely rapidly evolving landscape of object detection algorithms, very often leaving them without a clue which method should they use and which strategies should they follow to obtain the best possible performance on their application. Therefore, in this study, we apply two representative object detection models to a small-size dataset from a domain that differs significantly from natural images, and examine their detection accuracy and the effects of the models’ components on accuracy. The dataset used in this study consists of basic experimental images representing a phenomenon called droplet dispersion, which occurs inside an automobile exhaust pipe. The dataset consists of a small number of about 800 images, and the target of detection is a form of droplet dispersal called bag-breakup. While in the case of natural scene images usually there can be dozens of detection targets in a single image, or it is easy to obtain a large number of images with the targets of interest (like cats, or cars, etc.), our dataset has typically one and maximum two detection targets in a single image, while we can have many thousands of images collected from the experiments, which do not contain any detection target at all. Additionally, what makes the droplet dispersion images more challenging is that the detection targets (bag-breakups in our task) are very similar to the background (in terms of visual appearance and texture patterns); while in natural scene images, it is much more easier to discriminate between, for example, cats and their background. Finally, in test mode, the trained model has to process an enormous volume of experimental data/images generated daily and input as a video stream to the system, therefore real-time processing abilities are crucial. To address these issues, and provide guidance to practitioners who might have to deal with similar challenges, we consider a set of solutions and architecture choices suitable for such a setting and investigate their effectiveness. The rest of this paper is organized as follows. First, in the next section, we describe the concrete task which motivates our research. In Sect. 3, we discuss the most common problems and challenges which practitioners need to deal with when applying object detection methods in special domains and show how these can be overcome using corresponding practices and architecture decisions developed in recent state-ofthe-art methods in the object detection field. Additionally, in Sect. 3.4 we propose a new method, which by utilizing information from several consecutive frames further improves accuracy of detection by eliminating false positives which might occur in regions which closely resemble the visual structure of bag-breakup patterns and are difficult even for ex (...truncated)