R $$^{2}$$ S100K: Road-Region Segmentation Dataset for Semi-supervised Autonomous Driving in the Wild
International Journal of Computer Vision
https://doi.org/10.1007/s11263-024-02207-3
R2 S100K: Road-Region Segmentation Dataset for Semi-supervised
Autonomous Driving in the Wild
Muhammad Atif Butt1,2 · Hassan Ali2,3 · Adnan Qayyum2,4 · Waqas Sultani2 · Ala Al-Fuqaha5 · Junaid Qadir6
Received: 10 August 2023 / Accepted: 27 July 2024
© The Author(s) 2024
Abstract
Semantic understanding of roadways is a key enabling factor for safe autonomous driving. However, existing autonomous
driving datasets provide well-structured urban roads while ignoring unstructured roadways containing distress, potholes, water
puddles, and various kinds of road patches i.e., earthen, gravel etc. To this end, we introduce Road Region Segmentation
dataset (R2 S100K)—a large-scale dataset and benchmark for training and evaluation of road segmentation in aforementioned
challenging unstructured roadways. R2 S100K comprises 100K images extracted from a large and diverse set of video sequences
covering more than 1000 km of roadways. Out of these 100K privacy respecting images, 14,000 images have fine pixel-labeling
of road regions, with 86,000 unlabeled images that can be leveraged through semi-supervised learning methods. Alongside,
we present an Efficient Data Sampling based self-training framework to improve learning by leveraging unlabeled data.
Our experimental results demonstrate that the proposed method significantly improves learning methods in generalizability
and reduces the labeling cost for semantic segmentation tasks. Our benchmark will be publicly available to facilitate future
research at https://r2s100k.github.io/.
Keywords Autonomous driving · Semantic segmentation · Semi-supervised learning
1 Introduction
Communicated by Hong Liu.
B
Junaid Qadir
Muhammad Atif Butt
Hassan Ali
Adnan Qayyum
Waqas Sultani
Ala Al-Fuqaha
1
Computer Vision Center (CVC), Universitat Autònoma de
Barcelona Bellaterra, Cerdanyola del Vallès, Spain
2
Information Technology University of the Punjab, Lahore,
Pakistan
3
University of New South Wales (UNSW), High St,
Kensington NSW 2052, Sydney, Australia
4
Qatar University, Doha, Qatar
Visual perception for recognizing objects, obstacles, and
pedestrians is a core building block for efficient autonomous
driving. Semantic segmentation has emerged as an efficient
perception method that aims to determine the semantic labels
for each pixel of an image (Siam et al., 2018). Thanks to the
availability of rich scene segmentation datasets (discussed
in Fig. 2), significant technical progress has been made in
this direction. However, several formidable challenges still
remain on the path to efficient autonomous driving in the
wild.
Firstly, existing autonomous driving datasets (Brostow et
al., 2009; Caesar et al., 2020; Cordts et al., 2016; Geiger
et al., 2012; Sun et al., 2020; Yu et al., 2020) are not
generalized; they cover well-paved urban roads of developed countries which represents 3.7% road infrastructure
5
Information and Computing Technology (ICT) Division,
College of Science and Engineering, Hamad Bin Khalifa
University, Doha, Qatar
6
Department of Computer Science and Engineering, College
of Engineering, Qatar University, Doha, Qatar
123
International Journal of Computer Vision
of the world (Schwab, 2019) and barely serve 17% of
the total world’s population (Gaigbe-Togbe et al., 2022).
More recently, Segment Anything (Kirillov et al., 2023)—
the largest segmentation dataset with more than one billion
masks for 11 million images has been released to perform
general purpose segmentation tasks. However, despite being
the largest in size, it only covers 0.9% of data samples from
low-income countries. Therefore, these datasets have scant
coverage of unstructured roadways containing hazardous
road patches (i.e., distress, earthen, gravel) that are common
in the developing world, as shown in Fig. 1. Such ambiguous road regions pose an enormous hazard to human drivers
and lead to severe road accidents and fatalities. According
to World Health Organization (WHO), 1.3 million people
die every year due to road accidents (WHO, 2020) with
93% of causalities occurring in low- and middle-income
countries. The global road safety report points out that nonstandard road infrastructure is a key reason for higher road
accident rates in these countries (WHO, 2019). Therefore,
the under-representation of such challenging data in existing
datasets is a critical omission for research on autonomous
driving and an indication of the need for a benchmark
to improve autonomous driving in such challenging road
scenarios.
Secondly, pixel-level annotation of images is excessively
expensive—for cityscapes, labeling an image took an hour
on average (Cordts et al., 2016)—leading to smaller segmentation datasets than in other domains (Deng et al., 2009;
Lin et al., 2014), consequently limiting the generalizability
of the trained models. Although semi-supervised learning
methods (Abdalla et al., 2019; He et al., 2019; Huang et
al., 2018; Yu et al., 2022) have been proposed that leverage unlabeled data to improve learning, these methods suffer
limitations because (i) segmentation datasets are often highly
imbalanced in terms of pixel counts corresponding to each
class (Rezaei et al., 2020), and different physical scenarios in
which the dataset is collected. Therefore, the resulting model
performs significantly worse in physical scenarios that are
not common (e.g. rare weather conditions and unstructured
roads), which can be lethal in autonomous driving; (ii) Biased
predictions caused by the data imbalance in early semisupervised training phase (He et al., 2019) lead to a higher
misclassification rate during inference; (iii) self-training segmentation models are computationally very expensive due to
a large number of pseudo labels (Wei et al., 2018). In this
regard, there is a need for an efficient method to improve
performance while considering accuracy-energy trade-offs.
To address these challenges, we have made the following
contributions:
1. We introduce Road Region Segmentation (R2 S100K)
dataset for autonomous driving comprising 100K diverse
123
set of road images, covering 1000+ KMs of challenging
roadways, as shown in Fig. 1. R2 S100K dataset covers more challenging road categories and scenarios than
existing datasets. Moreover, R2S100K serves as an initial step in representing unstructured roads prevalent in
low-income countries, allowing for a more comprehensive stress-testing of foundational segmentation models
for autonomous driving.
2. We propose an unsupervised Efficient Data Sampling
(EDS) method to sample a subset from the unlabelled
training data, which offers three benefits: (i) EDS notably
alleviates the data imbalance in the physical scenarios,
(ii) improves the performance of supervised (0.71–6.72%
MIoU) and semi-supervised (0.26–1.84% MIoU) models,
and (iii) significantly reduces the annotation and training
costs (75% fewer pseudo-labels and 79% dec (...truncated)