Building Detection from SkySat Images with Transfer Learning: a Case Study over Ankara (pdf)

Article PDF cannot be displayed. You can download it here:

https://link.springer.com/content/pdf/10.1007/s41064-024-00279-x.pdf

Building Detection from SkySat Images with Transfer Learning: a Case Study over Ankara

PFG – Journal of Photogrammetry, Remote Sensing and Geoinformation Science (2024) 92:163–175 https://doi.org/10.1007/s41064-024-00279-x ORIGINAL ARTICLE Building Detection from SkySat Images with Transfer Learning: a Case Study over Ankara Kanako Sawa1,2 · Ilyas Yalcin3,4 · Sultan Kocaman2 Received: 11 July 2023 / Accepted: 6 February 2024 / Published online: 18 March 2024 © The Author(s) 2024 Abstract The detection and continuous updating of buildings in geodatabases has long been a major research area in geographic information science and is an important theme for national mapping agencies. Advancements in machine learning techniques, particularly state-of-the-art deep learning (DL) models, offer promising solutions for extracting and modeling building rooftops from images. However, tasks such as automatic labelling of learning data and the generalizability of models remain challenging. In this study, we assessed the sensor and geographic area adaptation capabilities of a pretrained DL model implemented in the ArcGIS environment using very-high-resolution (50 cm) SkySat imagery. The model was trained for digitizing building footprints via Mask R-CNN with a ResNet50 backbone using aerial and satellite images from parts of the USA. Here, we utilized images from three different SkySat satellites with various acquisition dates and off-nadir angles and refined the pretrained model using small numbers of buildings as training data (5–53 buildings) over Ankara. We evaluated the buildings in areas with different characteristics, such as urban transformation, slums, regular, and obtained high accuracies with F-1 scores of 0.92, 0.94, and 0.96 from SkySat 4, 7, and 17, respectively. The study findings showed that the DL model has high transfer learning capability for Ankara using only a few buildings and that the recent SkySat satellites demonstrate superior image quality. Keywords Building extraction · Deep Learning · SkySat Constellation · Geographic Information System · Fine-tuning 1 Introduction Geographic information systems (GIS) and geographic information science (GIScience) enable collaborations and facilitate interdisciplinary work between different domains such as urban planning, resource management, and scientific disciplines (MacEachren 2000). They also promote volKanako Sawa Ilyas Yalcin Sultan Kocaman 1 Geospatial Information Authority of Japan, Tsukuba, Japan 2 Department of Geomatics Engineering, Hacettepe University, 06800 Beytepe Ankara, Turkey 3 Graduate School of Science and Engineering, Hacettepe University, 06800 Beytepe Ankara, Turkey 4 Baskent OSB Technical Sciences Vocational School, Hacettepe University, 06909 Sincan Ankara, Turkey unteer participation in data collection and decision-making (Sun and Li 2016) by providing suitable platforms for data storage, access, analysis, and sharing (e.g., Can et al. 2020, 2021). Thanks to numerous online GIS platforms and open data repositories (e.g., the reference lists on OpenStreetMap Project (OpenStreetMap 2024), Open Aerial Map (2024), mapbox (2024) ArcGIS Online (ESRI 2024)), spatial analysis methods can be applied and tuned/configured by even non-professionals in the geospatial domain. Within this context, Rowland et al. (2020) also highlighted the increasing trend towards self-service applications among users, focusing not only on visualization and interactivity but also on analytics and usability features. GIS platforms also help to synthesize efforts of diverse science and engineering disciplines. A primary application domain has been smart cities. Collaboration among engineers, architects, computer scientists, urban planners, policymakers, decision-makers, and the general public is immensely required in this context (Buyukdemircioglu and Kocaman 2022). The geometric and semantic updating of geodatabases is crucial to ensure their usability, and this task has long been a primary responsibility of national mapping agencies. K 164 PFG – Journal of Photogrammetry, Remote Sensing and Geoinformation Science (2024) 92:163–175 Buildings in particular are subject to frequent changes due to construction or demolition. As the traditional approach, manual updating and mapping are highly challenging and time consuming. As an enabler, GIS platforms also facilitated crowdsourcing methodologies, such as volunteered geographic information (VGI) and citizen science. These approaches contribute to the collection and interpretation of geodata, and several studies exploring this aspect exist in the literature (e.g., see Chen and Zipf 2017; Fan et al. 2021; Can et al. 2020, 2021). Furthermore, recent advancements in deep-learning (DL) algorithms, particularly convolutional neural networks (CNN), have demonstrated significant potential for automatic detection and updating of various geospatial data including land use/land cover (LULC) types. Their efficiency in updating urban structures (Chen et al. 2021) and building footprints (Neupane et al. 2021; Buyukdemircioglu et al. 2021, 2022a, b), agricultural fields (Victor et al. 2022), complex LULC and topography (Sertel et al. 2022), and other related applications (Hoeser and Kuenzer 2020) has been demonstrated. Nevertheless, an ongoing challenge remains in ensuring the practicality and applicability of developed DL models and datasets across diverse user typologies and various geographical contexts. As part of its commercial endeavors, the Environmental Systems Research Institute, Inc. (ESRI) in Redlands, California, has integrated pretrained CNN models for extraction and classification of various features, such as building footprints, cars, trees, ships, railways, etc. from images and point clouds obtained from optical imaging, radar, and light detection and ranging (LiDAR) sensors within their software environment (ESRI 2023a). The software is accessible to a user base exceeding 350,000 organizations, encompassing users of ArcGIS Desktop, Enterprise, and Online, as of the time of writing (ESRI 2023b). With its user-friendly interfaces, it is possible to execute the DL applications within this software environment without necessitating advanced coding skills. Hence, individuals who are not experts in DL within their respective fields can readily utilize these models in their analyses. They also have the option to fine-tune the models with a small dataset specific to their application area. This flexibility depends on the domain adaptation and transfer learning capabilities inherent to the model. Thus, by using the tools provided on the ArcGIS platform, a diverse range of users including local government personnel responsible for geodatabase updates and planning can enhance the performance of the pretrained DL models in specific application areas and increase the accuracy and reliability. Transfer learning is an approach to apply knowledge and skills learned in previous tasks to new tasks (Pan and Yang 2010). If knowledge transfer between tasks is d (...truncated)