R $$^{2}$$ S100K: Road-Region Segmentation Dataset for Semi-supervised Autonomous Driving in the Wild (pdf)

Article PDF cannot be displayed. You can download it here:

https://link.springer.com/content/pdf/10.1007/s11263-024-02207-3.pdf

R $$^{2}$$ S100K: Road-Region Segmentation Dataset for Semi-supervised Autonomous Driving in the Wild

International Journal of Computer Vision https://doi.org/10.1007/s11263-024-02207-3 R2 S100K: Road-Region Segmentation Dataset for Semi-supervised Autonomous Driving in the Wild Muhammad Atif Butt1,2 · Hassan Ali2,3 · Adnan Qayyum2,4 · Waqas Sultani2 · Ala Al-Fuqaha5 · Junaid Qadir6 Received: 10 August 2023 / Accepted: 27 July 2024 © The Author(s) 2024 Abstract Semantic understanding of roadways is a key enabling factor for safe autonomous driving. However, existing autonomous driving datasets provide well-structured urban roads while ignoring unstructured roadways containing distress, potholes, water puddles, and various kinds of road patches i.e., earthen, gravel etc. To this end, we introduce Road Region Segmentation dataset (R2 S100K)—a large-scale dataset and benchmark for training and evaluation of road segmentation in aforementioned challenging unstructured roadways. R2 S100K comprises 100K images extracted from a large and diverse set of video sequences covering more than 1000 km of roadways. Out of these 100K privacy respecting images, 14,000 images have fine pixel-labeling of road regions, with 86,000 unlabeled images that can be leveraged through semi-supervised learning methods. Alongside, we present an Efficient Data Sampling based self-training framework to improve learning by leveraging unlabeled data. Our experimental results demonstrate that the proposed method significantly improves learning methods in generalizability and reduces the labeling cost for semantic segmentation tasks. Our benchmark will be publicly available to facilitate future research at https://r2s100k.github.io/. Keywords Autonomous driving · Semantic segmentation · Semi-supervised learning 1 Introduction Communicated by Hong Liu. B Junaid Qadir Muhammad Atif Butt Hassan Ali Adnan Qayyum Waqas Sultani Ala Al-Fuqaha 1 Computer Vision Center (CVC), Universitat Autònoma de Barcelona Bellaterra, Cerdanyola del Vallès, Spain 2 Information Technology University of the Punjab, Lahore, Pakistan 3 University of New South Wales (UNSW), High St, Kensington NSW 2052, Sydney, Australia 4 Qatar University, Doha, Qatar Visual perception for recognizing objects, obstacles, and pedestrians is a core building block for efficient autonomous driving. Semantic segmentation has emerged as an efficient perception method that aims to determine the semantic labels for each pixel of an image (Siam et al., 2018). Thanks to the availability of rich scene segmentation datasets (discussed in Fig. 2), significant technical progress has been made in this direction. However, several formidable challenges still remain on the path to efficient autonomous driving in the wild. Firstly, existing autonomous driving datasets (Brostow et al., 2009; Caesar et al., 2020; Cordts et al., 2016; Geiger et al., 2012; Sun et al., 2020; Yu et al., 2020) are not generalized; they cover well-paved urban roads of developed countries which represents 3.7% road infrastructure 5 Information and Computing Technology (ICT) Division, College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar 6 Department of Computer Science and Engineering, College of Engineering, Qatar University, Doha, Qatar 123 International Journal of Computer Vision of the world (Schwab, 2019) and barely serve 17% of the total world’s population (Gaigbe-Togbe et al., 2022). More recently, Segment Anything (Kirillov et al., 2023)— the largest segmentation dataset with more than one billion masks for 11 million images has been released to perform general purpose segmentation tasks. However, despite being the largest in size, it only covers 0.9% of data samples from low-income countries. Therefore, these datasets have scant coverage of unstructured roadways containing hazardous road patches (i.e., distress, earthen, gravel) that are common in the developing world, as shown in Fig. 1. Such ambiguous road regions pose an enormous hazard to human drivers and lead to severe road accidents and fatalities. According to World Health Organization (WHO), 1.3 million people die every year due to road accidents (WHO, 2020) with 93% of causalities occurring in low- and middle-income countries. The global road safety report points out that nonstandard road infrastructure is a key reason for higher road accident rates in these countries (WHO, 2019). Therefore, the under-representation of such challenging data in existing datasets is a critical omission for research on autonomous driving and an indication of the need for a benchmark to improve autonomous driving in such challenging road scenarios. Secondly, pixel-level annotation of images is excessively expensive—for cityscapes, labeling an image took an hour on average (Cordts et al., 2016)—leading to smaller segmentation datasets than in other domains (Deng et al., 2009; Lin et al., 2014), consequently limiting the generalizability of the trained models. Although semi-supervised learning methods (Abdalla et al., 2019; He et al., 2019; Huang et al., 2018; Yu et al., 2022) have been proposed that leverage unlabeled data to improve learning, these methods suffer limitations because (i) segmentation datasets are often highly imbalanced in terms of pixel counts corresponding to each class (Rezaei et al., 2020), and different physical scenarios in which the dataset is collected. Therefore, the resulting model performs significantly worse in physical scenarios that are not common (e.g. rare weather conditions and unstructured roads), which can be lethal in autonomous driving; (ii) Biased predictions caused by the data imbalance in early semisupervised training phase (He et al., 2019) lead to a higher misclassification rate during inference; (iii) self-training segmentation models are computationally very expensive due to a large number of pseudo labels (Wei et al., 2018). In this regard, there is a need for an efficient method to improve performance while considering accuracy-energy trade-offs. To address these challenges, we have made the following contributions: 1. We introduce Road Region Segmentation (R2 S100K) dataset for autonomous driving comprising 100K diverse 123 set of road images, covering 1000+ KMs of challenging roadways, as shown in Fig. 1. R2 S100K dataset covers more challenging road categories and scenarios than existing datasets. Moreover, R2S100K serves as an initial step in representing unstructured roads prevalent in low-income countries, allowing for a more comprehensive stress-testing of foundational segmentation models for autonomous driving. 2. We propose an unsupervised Efficient Data Sampling (EDS) method to sample a subset from the unlabelled training data, which offers three benefits: (i) EDS notably alleviates the data imbalance in the physical scenarios, (ii) improves the performance of supervised (0.71–6.72% MIoU) and semi-supervised (0.26–1.84% MIoU) models, and (iii) significantly reduces the annotation and training costs (75% fewer pseudo-labels and 79% dec (...truncated)