Understanding tourists’ urban images with geotagged photos using convolutional neural networks
Spat. Inf. Res.
https://doi.org/10.1007/s41324-019-00285-x
Understanding tourists’ urban images with geotagged photos
using convolutional neural networks
Dongeun Kim1 • Youngok Kang1
•
Yerim Park1 • Nayeon Kim1 • Juyoon Lee1
Received: 4 March 2019 / Revised: 21 July 2019 / Accepted: 25 July 2019
Ó The Author(s) 2019
Abstract This study aims to track down representative
images and elements of sightseeing attractions by analyzing the photos uploaded on Flickr by Seoul tourists with the
image mining technique. For this purpose, we crawled the
photos uploaded on Flickr and classified users into residents and tourists; drew 11 region of attractions (RoA) in
Seoul by analyzing the spatial density of the photos; classified the photos into 1000 categories and then 14 categories by grouping 1000 categories by utilizing Inception
V3 model; analyzed the characteristics of the photo image
by RoA. Key findings of this study are that tourists are
interested in old palaces, historical monuments, stores,
food, etc. and those key elements are distinguished from
the major sightseeing attractions in Seoul. More specifically, tourists are more interested in palaces and cultural
assets in Jongno and Namsan, food and restaurants in
Shinchon, Hongdae, Itaewon, Yeouido, Garosu-gil, and
Apgujeong, war monuments or specific artifacts in War
Memorial and the National Museum of Korea, facilities,
temples, and pictures of cultural properties in Samsung
Station, and toyshops in Jamsil. This study is meaningful in
three folds: first, it tries to analyze urban image through the
photos posted on SNS by tourists. Second, it uses deep
learning technique to analyze the photos. Third, it classifies
and analyzes the whole photos posted by Seoul tourists
while most of other researches focus on only specific
objects. However, this study has a limitation because the
Inception v3 model which has been used in this research is
a pre-trained model created by training the ImageNet data.
& Youngok Kang
1
Department of Social Studies, Ewha Womans University,
Seoul, South Korea
In future research, it is necessary to classify photo categories according to the purpose of tourism and retrain the
model by creating new training data set focusing on elements of Korea.
Keywords Image data mining Geotagged photos
Flickr Convolutional neural network Inception v3 model
1 Introduction
Today people prefer to share the posts such as texts, images, and videos via Social Network Services (SNS) with
others without regard to time and location. Moreover, the
geo-tagged photos uploaded on the site by tourists display
the perception and the action of tourists as well as the
images that tourists feel about the sightseeing attractions
[1]. As the images of touristic sites are closely associated
with the tourists’ attraction and intention, they serve as a
reference for other tourists who seek to travel to those sites
[2]. In addition, as the touristic images on SNS can be
continually produced and reproduced, we are able to
ascertain the perceptions and the trends of representative
sightseeing elements and locations by analyzing the images
uploaded on SNS. Furthermore, this process contributes to
the basic research on tourism for discovering, developing,
and improving sightseeing attractions [3].
We think that it is possible to conduct in-depth analysis
with the extracted information in tandem with pre-existing
methodologies of spatial data analysis because geo-tagged
photos contain locational information. Especially we can
make better use of Flickr data because they contain the
information on location and time and are automatically
affiliated with photo metadata. Previous studies which have
utilized geotagged data on SNS have mostly explored the
123
D. Kim et al.
location that users occupied [4–6], the patterns of movement [7, 8] and the texts of uploaded photos [9–16].
However, as the image analysis using deep learning technology becomes available, the studies using the photos
posted on SNS keep increasing recently. Examples of
researches on analyzing the photos posted on the SNS
include classification of food [11], analysis of bird observations between experts and ordinary people [17], estimation of weather preference by visiting specific places
[18]. Most of the studies are focused on analyzing the
photos which contain specific objects. There have been no
studies to analyze the image of tourists in the area by
classifying the whole photos posted by the tourists who
visit the specific area.
The purpose of this study is to analyze representative
images and elements of sightseeing attractions by analyzing the photos uploaded on Flickr by Seoul tourists. For
this purpose, first, we crawled the photos uploaded on
Flickr, which is one of Social Network Service (SNS)
platforms that people can share geotagged photos, and
classified users into residents and tourists. Second, we drew
11 region of attractions (RoA) in Seoul by analyzing the
spatial density of the photos uploaded by tourists. Third,
we classified the photos into 1000 categories and then 14
categories by grouping 1000 categories by utilizing
Inception V3 model, which is one of the convolutional
neural networks (CNN) with deep learning capability.
Finally, we analyzed the characteristics of photo image by
RoA.
2 Research on image data mining
via convolutional neural networks
Image data mining is the process of extracting information
or knowledge from image data [19]. Recently, with the
increase in the volume of image data as well as the
improvement of training algorithm, techniques of image
data mining using artificial neural networks have been
applied to various fields such as medicine, environmental
studies, information science, and computer graphics [20].
Convolutional neural network (CNN) which is one of
artificial neural networks has been developed based on
neurological knowledge surrounding the visual cortex of
humans and animals [21]. As CNN has been shown to be
effective in distinguishing and categorizing the photo
images, it has become a trend to make use of it in most
image data mining research. CNN is basically composed of
three layers such as a convolutional layer, a pooling layer,
and a fully connected layer. One can not only produce a
variety of models by changing the CNN configurations, but
also train the CNN through the scan of the image
characteristics.
123
Researches on classification of images by category using
CNN method have been actively conducted in the field of
medicine. Krishnan et al. [22] categorized liver diseases
surfaced on the images of ultrasonic inspection. Sawant
et al. [23] detected brain cancer through MRI, and Motlagh
et al. [24] distinguished breast cancer from the images of
histopathological samples. Further, CNN method has been
applied in other fields of image mining. Park and Shim [25]
established a model of discerning the genre from the
images of movie posters, taking inspiration from the
thought that elements such as (...truncated)