DUDES: Deep Uncertainty Distillation using Ensembles for Semantic Segmentation
PFG – Journal of Photogrammetry, Remote Sensing and Geoinformation Science (2024) 92:101–114
https://doi.org/10.1007/s41064-024-00280-4
ORIGINAL ARTICLE
DUDES: Deep Uncertainty Distillation using Ensembles for Semantic
Segmentation
Steven Landgraf1
· Kira Wursthorn1 · Markus Hillemann1 · Markus Ulrich1
Received: 27 November 2023 / Accepted: 1 February 2024 / Published online: 25 March 2024
© The Author(s) 2024
Abstract
The intersection of deep learning and photogrammetry unveils a critical need for balancing the power of deep neural networks with interpretability and trustworthiness, especially for safety-critical application like autonomous driving, medical
imaging, or machine vision tasks with high demands on reliability. Quantifying the predictive uncertainty is a promising
endeavour to open up the use of deep neural networks for such applications. Unfortunately, most current available methods
are computationally expensive. In this work, we present a novel approach for efficient and reliable uncertainty estimation
for semantic segmentation, which we call Deep Uncertainty Distillation using Ensembles for Segmentation (DUDES).
DUDES applies student-teacher distillation with a Deep Ensemble to accurately approximate predictive uncertainties with
a single forward pass while maintaining simplicity and adaptability. Experimentally, DUDES accurately captures predictive
uncertainties without sacrificing performance on the segmentation task and indicates impressive capabilities of highlighting
wrongly classified pixels and out-of-domain samples through high uncertainties on the Cityscapes and Pascal VOC 2012
dataset. With DUDES, we manage to simultaneously simplify and outperform previous work on Deep-Ensemble-based
Uncertainty Distillation.
Keywords Deep Learning · Semantic Segmentation · Uncertainty Quantification · Deep Ensemble · Knowledge
Distillation
1 Introduction
In recent years, approaches based on deep neural networks
have become the most popular and successful solution for
semantic segmentation problems (Minaee et al. 2022). Despite their unrivaled performance on established benchmark
datasets like Cityscapes (Cordts et al. 2016) or PASCAL
VOC (Everingham et al. 2010), neural networks lack interpretability (Gawlikowski et al. 2022), are unable to distinguish between in-domain and out-of-domain samples (Lee
Steven Landgraf
Kira Wursthorn
Markus Hillemann
Markus Ulrich
1
Institute of Photogrammetry and Remote Sensing (IPF),
Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany
et al. 2018), and tend to be overconfident (Guo et al. 2017).
These shortcomings are especially severe for safety-critical applications like autonomous driving (McAllister et al.
2017) and the analysis of medical imaging (Leibig et al.
2017) or computer vision tasks that have high demands on
reliability like industrial inspection (Steger et al. 2018) and
automation (Ulrich and Hillemann 2023).
Quantifying the predictive uncertainty is a promising endeavour to make such applications safer and more reliable,
e.g., by preemptively making risk-averse predictions or by
providing feedback to a human operator when predictions
are uncertain. Some of the most relevant methods include
Bayesian Neural Networks (MacKay 1992), Monte Carlo
Dropout (Gal and Ghahramani 2016), and Deep Ensembles (Lakshminarayanan et al. 2017). Unfortunately, most
methods require a computationally expensive estimation of
a distribution of outputs by sampling from a stochastic process. Recently, the concept of knowledge distillation has
been introduced as a potential solution (Shen et al. 2021;
Besnier et al. 2021; Holder and Shafique 2021; Simpson
et al. 2022). Knowledge distillation is a technique for transferring the knowledge embodied in a complex model, re-
K
102
PFG – Journal of Photogrammetry, Remote Sensing and Geoinformation Science (2024) 92:101–114
ferent uncertainty quantification method for the teacher, and
a different dataset to demonstrate the generalizability of
DUDES in Sect. 5. Sect. 6 discusses the experimental results and their potential impact on future research. Sect. 7
concludes the paper.
2 Related Work
In this section, we summarize the related work on uncertainty quantification and knowledge distillation.
2.1 Uncertainty Quantification
Fig. 1 DUDES applies student-teacher distillation with a Deep Ensemble (DE) to accurately approximate predictive uncertainties with a single forward pass while maintaining simplicity and adaptability
ferred to as the teacher, to a smaller model, referred to
as the student. By incorporating the knowledge learned by
a more complex model, the student’s performance can be
enhanced (Hinton et al. 2015; Romero et al. 2015; Malinin
et al. 2019).
In this work, we present a novel approach for efficient
and reliable uncertainty quantification, which we call Deep
Uncertainty Distillation using Ensembles for Segmentation
(DUDES) as shown in Fig. 1. DUDES applies studentteacher distillation with a Deep Ensemble to accurately approximate predictive uncertainties while maintaining simplicity and adaptability. In comparison to the Deep Ensemble teacher, the student only needs a single forward pass to
obtain predictive uncertainties, which massively reduces the
inference time and eliminates the computational overhead
that is associated with having to deal with multiple models and forward passes. DUDES simultaneously simplifies
and outperforms previous work on Deep-Ensemble-based
uncertainty distillation, which we experimentally evaluate
on the Cityscapes and Pascal VOC 2012 dataset.
After an overview of the related work on uncertainty
quantification and knowledge distillation in Sect. 2, the
methodology of DUDES is described in Sect. 3. In Sect. 4,
we demonstrate the ability of DUDES through quantitative and qualitative analysis and investigate the potential
for identifying wrongly classified pixels or out-of-domain
samples with the help of the predictive uncertainties qualitatively. Thereafter, we provide an extended set of experiments with a Vision-Transformer-based architecture, a dif-
K
Deep neural networks consist of a large number of model
parameters and include non-linearities, which generally
makes the exact posterior probability distribution of a network’s output prediction intractable (Blundell et al. 2015;
Loquercio et al. 2020). This leads to approximate uncertainty quantification approaches including softmax
probability, Bayesian techniques like Bayesian Neural
Networks (MacKay 1992), Monte Carlo Dropout (Gal
and Ghahramani 2016), and Deep Ensembles (Lakshminarayanan et al. 2017).
While the softmax predictions are easy to implement,
they tend to be overconfident and need to be calibrated in
order to produce reliable confidence predictions where the
predicted probability and the actual likelihood are in agreement (Guo et al. 2017). Additionally, softmax predictions
are often erroneously interpreted as model confidence (Gal
and Ghahramani 2016). (...truncated)