Building Detection from SkySat Images with Transfer Learning: a Case Study over Ankara
PFG – Journal of Photogrammetry, Remote Sensing and Geoinformation Science (2024) 92:163–175
https://doi.org/10.1007/s41064-024-00279-x
ORIGINAL ARTICLE
Building Detection from SkySat Images with Transfer Learning: a Case
Study over Ankara
Kanako Sawa1,2
· Ilyas Yalcin3,4
· Sultan Kocaman2
Received: 11 July 2023 / Accepted: 6 February 2024 / Published online: 18 March 2024
© The Author(s) 2024
Abstract
The detection and continuous updating of buildings in geodatabases has long been a major research area in geographic
information science and is an important theme for national mapping agencies. Advancements in machine learning techniques, particularly state-of-the-art deep learning (DL) models, offer promising solutions for extracting and modeling
building rooftops from images. However, tasks such as automatic labelling of learning data and the generalizability of
models remain challenging. In this study, we assessed the sensor and geographic area adaptation capabilities of a pretrained
DL model implemented in the ArcGIS environment using very-high-resolution (50 cm) SkySat imagery. The model was
trained for digitizing building footprints via Mask R-CNN with a ResNet50 backbone using aerial and satellite images
from parts of the USA. Here, we utilized images from three different SkySat satellites with various acquisition dates and
off-nadir angles and refined the pretrained model using small numbers of buildings as training data (5–53 buildings) over
Ankara. We evaluated the buildings in areas with different characteristics, such as urban transformation, slums, regular,
and obtained high accuracies with F-1 scores of 0.92, 0.94, and 0.96 from SkySat 4, 7, and 17, respectively. The study
findings showed that the DL model has high transfer learning capability for Ankara using only a few buildings and that
the recent SkySat satellites demonstrate superior image quality.
Keywords Building extraction · Deep Learning · SkySat Constellation · Geographic Information System · Fine-tuning
1 Introduction
Geographic information systems (GIS) and geographic information science (GIScience) enable collaborations and
facilitate interdisciplinary work between different domains
such as urban planning, resource management, and scientific disciplines (MacEachren 2000). They also promote volKanako Sawa
Ilyas Yalcin
Sultan Kocaman
1
Geospatial Information Authority of Japan, Tsukuba, Japan
2
Department of Geomatics Engineering, Hacettepe University,
06800 Beytepe Ankara, Turkey
3
Graduate School of Science and Engineering, Hacettepe
University, 06800 Beytepe Ankara, Turkey
4
Baskent OSB Technical Sciences Vocational School,
Hacettepe University, 06909 Sincan Ankara, Turkey
unteer participation in data collection and decision-making
(Sun and Li 2016) by providing suitable platforms for data
storage, access, analysis, and sharing (e.g., Can et al. 2020,
2021). Thanks to numerous online GIS platforms and open
data repositories (e.g., the reference lists on OpenStreetMap
Project (OpenStreetMap 2024), Open Aerial Map (2024),
mapbox (2024) ArcGIS Online (ESRI 2024)), spatial analysis methods can be applied and tuned/configured by even
non-professionals in the geospatial domain. Within this context, Rowland et al. (2020) also highlighted the increasing
trend towards self-service applications among users, focusing not only on visualization and interactivity but also on
analytics and usability features. GIS platforms also help to
synthesize efforts of diverse science and engineering disciplines. A primary application domain has been smart cities.
Collaboration among engineers, architects, computer scientists, urban planners, policymakers, decision-makers, and
the general public is immensely required in this context
(Buyukdemircioglu and Kocaman 2022).
The geometric and semantic updating of geodatabases
is crucial to ensure their usability, and this task has long
been a primary responsibility of national mapping agencies.
K
164
PFG – Journal of Photogrammetry, Remote Sensing and Geoinformation Science (2024) 92:163–175
Buildings in particular are subject to frequent changes due
to construction or demolition. As the traditional approach,
manual updating and mapping are highly challenging and
time consuming. As an enabler, GIS platforms also facilitated crowdsourcing methodologies, such as volunteered
geographic information (VGI) and citizen science. These
approaches contribute to the collection and interpretation
of geodata, and several studies exploring this aspect exist in the literature (e.g., see Chen and Zipf 2017; Fan
et al. 2021; Can et al. 2020, 2021). Furthermore, recent advancements in deep-learning (DL) algorithms, particularly
convolutional neural networks (CNN), have demonstrated
significant potential for automatic detection and updating
of various geospatial data including land use/land cover
(LULC) types. Their efficiency in updating urban structures
(Chen et al. 2021) and building footprints (Neupane et al.
2021; Buyukdemircioglu et al. 2021, 2022a, b), agricultural
fields (Victor et al. 2022), complex LULC and topography
(Sertel et al. 2022), and other related applications (Hoeser
and Kuenzer 2020) has been demonstrated. Nevertheless, an
ongoing challenge remains in ensuring the practicality and
applicability of developed DL models and datasets across
diverse user typologies and various geographical contexts.
As part of its commercial endeavors, the Environmental Systems Research Institute, Inc. (ESRI) in Redlands,
California, has integrated pretrained CNN models for extraction and classification of various features, such as building footprints, cars, trees, ships, railways, etc. from images
and point clouds obtained from optical imaging, radar, and
light detection and ranging (LiDAR) sensors within their
software environment (ESRI 2023a). The software is accessible to a user base exceeding 350,000 organizations,
encompassing users of ArcGIS Desktop, Enterprise, and
Online, as of the time of writing (ESRI 2023b). With its
user-friendly interfaces, it is possible to execute the DL
applications within this software environment without necessitating advanced coding skills. Hence, individuals who
are not experts in DL within their respective fields can readily utilize these models in their analyses. They also have the
option to fine-tune the models with a small dataset specific
to their application area. This flexibility depends on the
domain adaptation and transfer learning capabilities inherent to the model. Thus, by using the tools provided on the
ArcGIS platform, a diverse range of users including local
government personnel responsible for geodatabase updates
and planning can enhance the performance of the pretrained
DL models in specific application areas and increase the accuracy and reliability.
Transfer learning is an approach to apply knowledge and
skills learned in previous tasks to new tasks (Pan and Yang
2010). If knowledge transfer between tasks is d (...truncated)