A SEMI-AUTOMATIC PROCEDURE FOR A DEMOGRAPHIC ANALYSIS OF THE FOSS4G DEVELOPERS’ COMMUNITY
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-4/W8, 2018
FOSS4G 2018 – Academic Track, 29–31 August 2018, Dar es Salaam, Tanzania
A SEMI-AUTOMATIC PROCEDURE FOR A DEMOGRAPHIC ANALYSIS OF THE
FOSS4G DEVELOPERS’ COMMUNITY
D. Oxoli a, H.-K. Kang b*, M. A. Brovelli a
a Department of Civil and Environmental Engineering, Politecnico di Milano, Milan, Italy -
(daniele.oxoli, maria.brovelli)@polimi.it
b Geospatial Information Research Division, Korea Research Institute for Human Settlement, Sejong
30147, Korea -
Commission IV, WG IV/4
KEY WORDS: Community, Collaboration, FOSS4G, GitHub, Software Development, Open Source
ABSTRACT:
The open and direct collaboration at the creation, improvement, and documentation of source code and software applications - enabled
by the web - is recognized as a peculiarity of the Free and Open Source Software for Geospatial (FOSS4G) projects representing, at
the same time, one of their main strengths. With this in mind, it turns out to be interesting to perform an extensive monitoring of both
the evolution and the geographical arrangement of the developers’ communities in order to investigate their actual extension, evolution
and degree of activity. In this work, a semi-automatic procedure to perform this particular analysis is described. The procedure is
mainly based on the use of the GitHub Search Application Programming Interface by means of JavaScript custom modules to perform
a census of the users registered with a collaborator role to the repositories of the most popular FOSS4G projects, hosted on the GitHub
platform. The collected data is processed and analysed using Python and QGIS. The results - presented through tables, charts, and
thematic maps - allow describing both dimensions as well as the geographical heterogeneity of the contributing community of each
individual project, while enabling to identify the most active countries - in terms of the number of contributors - in the development of
the most popular FOSS4G. The limits of the analysis, including technical constraints and considerations on the significance of the
developers' census, are finally highlighted and discussed.
1. INTRODUCTION
The concepts of community and participation - together with the
dogmas of the free dissemination and use of content - represent
the pillars of the open source movement. These principles as well
apply in the context of the Free and Open Source Software for
Geospatial (FOSS4G) development (Brovelli et al., 2017). The
open source communities are typically founded by individuals or
groups which operates autonomously while recruiting and
networking with other communities or community members to
contribute organically to the common goal of developing and
maintaining software projects of interest (West & O'mahony,
2008).
Community software development is generally performed
through the use of distributed Version Control Systems - such as
the Git (https://git-scm.com) - which enable the management of
source code creation, revision, and deployment among large
groups of contributors (Alwis & Sillito, 2009). The Git
framework is implemented in different web platform allowing
contributors to host and manage their development work on
cloud-based systems. Nowadays, one of the largest and most
popular Git-based web-hosting platforms is GitHub
(https://github.com) (Kalliamvakou et al., 2014).
This distributed and participatory development approach unfolds
a general interest in discovering both geographical extension and
degree of activity of the developers’ communities (Thung et al.,
2013). Therefore, the primary goal of the presented work is to
outline the contribution of individuals and national communities
to the development and/or maintenance of the most popular
FOSS4G projects. Being the ecosystem of FOSS4G extremely
broad and heterogeneous, the most representative projects have
been selected considering the ones promoted by the Open Source
Geospatial Foundation (OSGeo, https://www.osgeo.org), a notfor-profit organization which mission is to foster the adoption of
open geospatial technology worldwide by supporting
community-driven software development. This is achieved by
exploiting the GitHub platform capabilities, where indeed a
number of the OSGeo projects are hosted by means of dedicated
source code repositories (Löwe et al., 2017).
The paper is structured as follows: Section 2 presents details and
limitations of the data collection strategy adopted. Results are
presented in Section 3. In Section 4, conclusions and future
directions for the work are outlined.
2. DATA COLLECTION STRATEGY
Besides the hosting facilities, GitHub provides a number of
Application Programming Interfaces (APIs) which allow users to
programmatically interact with the platform services. Of primary
interest for this work are the capabilities enabled by the GitHub
Search API (https://developer.github.com/v3/search). The
Search API provides developers with functionalities for inquiring
items within GitHub such as users’ profiles and/or project
repository features.
*Corresponding author
This contribution has been peer-reviewed.
https://doi.org/10.5194/isprs-archives-XLII-4-W8-171-2018 | © Authors 2018. CC BY 4.0 License.
171
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-4/W8, 2018
FOSS4G 2018 – Academic Track, 29–31 August 2018, Dar es Salaam, Tanzania
A dedicated software application to perform the presented
procedure - based on custom JavaScript modules and Python
scripts - has been developed by the authors and made available
on
GitHub
(https://github.com/danioxoli/FOSS4G_contributors). A short
guide for running the data collection and analysis can be found in
the above repository, by easing the reuse of the application.
2.1 Data collection and processing
The Search API has been exploited to automatically retrieve from
the GitHub platform first, the number of users which have
contributed to each of the OSGeo selected projects (see Figure 1)
and last the location of each identified user by querying its
personal GitHub profile. The location information is among the
optional parameters which users are free to specify at the creation
of their personal GitHub accounts. Therefore, location
information is not always available for the whole identified users.
The designed procedure for discovering contributors of each
project of interest requires as input a list of GitHub project
repositories in CSV format including their URLs. A dedicated
JavaScript module is run to perform the collection of
contributors’ GitHub usernames for each repository included in
the input CSV list by compiling and sending specific Search API
requests. The module combines API responses into a single
JSON dictionary containing as keys the repository names and as
values the list of detected contributors’ GitHub usernames. The
output of this first step is passed in input to a (...truncated)