DUDES: Deep Uncertainty Distillation using Ensembles for Semantic Segmentation (pdf)

Article PDF cannot be displayed. You can download it here:

https://link.springer.com/content/pdf/10.1007/s41064-024-00280-4.pdf

DUDES: Deep Uncertainty Distillation using Ensembles for Semantic Segmentation

PFG – Journal of Photogrammetry, Remote Sensing and Geoinformation Science (2024) 92:101–114 https://doi.org/10.1007/s41064-024-00280-4 ORIGINAL ARTICLE DUDES: Deep Uncertainty Distillation using Ensembles for Semantic Segmentation Steven Landgraf1 · Kira Wursthorn1 · Markus Hillemann1 · Markus Ulrich1 Received: 27 November 2023 / Accepted: 1 February 2024 / Published online: 25 March 2024 © The Author(s) 2024 Abstract The intersection of deep learning and photogrammetry unveils a critical need for balancing the power of deep neural networks with interpretability and trustworthiness, especially for safety-critical application like autonomous driving, medical imaging, or machine vision tasks with high demands on reliability. Quantifying the predictive uncertainty is a promising endeavour to open up the use of deep neural networks for such applications. Unfortunately, most current available methods are computationally expensive. In this work, we present a novel approach for efficient and reliable uncertainty estimation for semantic segmentation, which we call Deep Uncertainty Distillation using Ensembles for Segmentation (DUDES). DUDES applies student-teacher distillation with a Deep Ensemble to accurately approximate predictive uncertainties with a single forward pass while maintaining simplicity and adaptability. Experimentally, DUDES accurately captures predictive uncertainties without sacrificing performance on the segmentation task and indicates impressive capabilities of highlighting wrongly classified pixels and out-of-domain samples through high uncertainties on the Cityscapes and Pascal VOC 2012 dataset. With DUDES, we manage to simultaneously simplify and outperform previous work on Deep-Ensemble-based Uncertainty Distillation. Keywords Deep Learning · Semantic Segmentation · Uncertainty Quantification · Deep Ensemble · Knowledge Distillation 1 Introduction In recent years, approaches based on deep neural networks have become the most popular and successful solution for semantic segmentation problems (Minaee et al. 2022). Despite their unrivaled performance on established benchmark datasets like Cityscapes (Cordts et al. 2016) or PASCAL VOC (Everingham et al. 2010), neural networks lack interpretability (Gawlikowski et al. 2022), are unable to distinguish between in-domain and out-of-domain samples (Lee Steven Landgraf Kira Wursthorn Markus Hillemann Markus Ulrich 1 Institute of Photogrammetry and Remote Sensing (IPF), Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany et al. 2018), and tend to be overconfident (Guo et al. 2017). These shortcomings are especially severe for safety-critical applications like autonomous driving (McAllister et al. 2017) and the analysis of medical imaging (Leibig et al. 2017) or computer vision tasks that have high demands on reliability like industrial inspection (Steger et al. 2018) and automation (Ulrich and Hillemann 2023). Quantifying the predictive uncertainty is a promising endeavour to make such applications safer and more reliable, e.g., by preemptively making risk-averse predictions or by providing feedback to a human operator when predictions are uncertain. Some of the most relevant methods include Bayesian Neural Networks (MacKay 1992), Monte Carlo Dropout (Gal and Ghahramani 2016), and Deep Ensembles (Lakshminarayanan et al. 2017). Unfortunately, most methods require a computationally expensive estimation of a distribution of outputs by sampling from a stochastic process. Recently, the concept of knowledge distillation has been introduced as a potential solution (Shen et al. 2021; Besnier et al. 2021; Holder and Shafique 2021; Simpson et al. 2022). Knowledge distillation is a technique for transferring the knowledge embodied in a complex model, re- K 102 PFG – Journal of Photogrammetry, Remote Sensing and Geoinformation Science (2024) 92:101–114 ferent uncertainty quantification method for the teacher, and a different dataset to demonstrate the generalizability of DUDES in Sect. 5. Sect. 6 discusses the experimental results and their potential impact on future research. Sect. 7 concludes the paper. 2 Related Work In this section, we summarize the related work on uncertainty quantification and knowledge distillation. 2.1 Uncertainty Quantification Fig. 1 DUDES applies student-teacher distillation with a Deep Ensemble (DE) to accurately approximate predictive uncertainties with a single forward pass while maintaining simplicity and adaptability ferred to as the teacher, to a smaller model, referred to as the student. By incorporating the knowledge learned by a more complex model, the student’s performance can be enhanced (Hinton et al. 2015; Romero et al. 2015; Malinin et al. 2019). In this work, we present a novel approach for efficient and reliable uncertainty quantification, which we call Deep Uncertainty Distillation using Ensembles for Segmentation (DUDES) as shown in Fig. 1. DUDES applies studentteacher distillation with a Deep Ensemble to accurately approximate predictive uncertainties while maintaining simplicity and adaptability. In comparison to the Deep Ensemble teacher, the student only needs a single forward pass to obtain predictive uncertainties, which massively reduces the inference time and eliminates the computational overhead that is associated with having to deal with multiple models and forward passes. DUDES simultaneously simplifies and outperforms previous work on Deep-Ensemble-based uncertainty distillation, which we experimentally evaluate on the Cityscapes and Pascal VOC 2012 dataset. After an overview of the related work on uncertainty quantification and knowledge distillation in Sect. 2, the methodology of DUDES is described in Sect. 3. In Sect. 4, we demonstrate the ability of DUDES through quantitative and qualitative analysis and investigate the potential for identifying wrongly classified pixels or out-of-domain samples with the help of the predictive uncertainties qualitatively. Thereafter, we provide an extended set of experiments with a Vision-Transformer-based architecture, a dif- K Deep neural networks consist of a large number of model parameters and include non-linearities, which generally makes the exact posterior probability distribution of a network’s output prediction intractable (Blundell et al. 2015; Loquercio et al. 2020). This leads to approximate uncertainty quantification approaches including softmax probability, Bayesian techniques like Bayesian Neural Networks (MacKay 1992), Monte Carlo Dropout (Gal and Ghahramani 2016), and Deep Ensembles (Lakshminarayanan et al. 2017). While the softmax predictions are easy to implement, they tend to be overconfident and need to be calibrated in order to produce reliable confidence predictions where the predicted probability and the actual likelihood are in agreement (Guo et al. 2017). Additionally, softmax predictions are often erroneously interpreted as model confidence (Gal and Ghahramani 2016). (...truncated)