A SEMI-AUTOMATIC PROCEDURE FOR A DEMOGRAPHIC ANALYSIS OF THE FOSS4G DEVELOPERS’ COMMUNITY

Jul 2018

The open and direct collaboration at the creation, improvement, and documentation of source code and software applications – enabled by the web – is recognized as a peculiarity of the Free and Open Source Software for Geospatial (FOSS4G) projects representing, at the same time, one of their main strengths. With this in mind, it turns out to be interesting to perform an extensive monitoring of both the evolution and the geographical arrangement of the developers’ communities in order to investigate their actual extension, evolution and degree of activity. In this work, a semi-automatic procedure to perform this particular analysis is described. The procedure is mainly based on the use of the GitHub Search Application Programming Interface by means of JavaScript custom modules to perform a census of the users registered with a collaborator role to the repositories of the most popular FOSS4G projects, hosted on the GitHub platform. The collected data is processed and analysed using Python and QGIS. The results – presented through tables, charts, and thematic maps – allow describing both dimensions as well as the geographical heterogeneity of the contributing community of each individual project, while enabling to identify the most active countries – in terms of the number of contributors – in the development of the most popular FOSS4G. The limits of the analysis, including technical constraints and considerations on the significance of the developers' census, are finally highlighted and discussed.

A SEMI-AUTOMATIC PROCEDURE FOR A DEMOGRAPHIC ANALYSIS OF THE FOSS4G DEVELOPERS’ COMMUNITY

The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-4/W8, 2018 FOSS4G 2018 – Academic Track, 29–31 August 2018, Dar es Salaam, Tanzania A SEMI-AUTOMATIC PROCEDURE FOR A DEMOGRAPHIC ANALYSIS OF THE FOSS4G DEVELOPERS’ COMMUNITY D. Oxoli a, H.-K. Kang b*, M. A. Brovelli a a Department of Civil and Environmental Engineering, Politecnico di Milano, Milan, Italy - (daniele.oxoli, maria.brovelli)@polimi.it b Geospatial Information Research Division, Korea Research Institute for Human Settlement, Sejong 30147, Korea - Commission IV, WG IV/4 KEY WORDS: Community, Collaboration, FOSS4G, GitHub, Software Development, Open Source ABSTRACT: The open and direct collaboration at the creation, improvement, and documentation of source code and software applications - enabled by the web - is recognized as a peculiarity of the Free and Open Source Software for Geospatial (FOSS4G) projects representing, at the same time, one of their main strengths. With this in mind, it turns out to be interesting to perform an extensive monitoring of both the evolution and the geographical arrangement of the developers’ communities in order to investigate their actual extension, evolution and degree of activity. In this work, a semi-automatic procedure to perform this particular analysis is described. The procedure is mainly based on the use of the GitHub Search Application Programming Interface by means of JavaScript custom modules to perform a census of the users registered with a collaborator role to the repositories of the most popular FOSS4G projects, hosted on the GitHub platform. The collected data is processed and analysed using Python and QGIS. The results - presented through tables, charts, and thematic maps - allow describing both dimensions as well as the geographical heterogeneity of the contributing community of each individual project, while enabling to identify the most active countries - in terms of the number of contributors - in the development of the most popular FOSS4G. The limits of the analysis, including technical constraints and considerations on the significance of the developers' census, are finally highlighted and discussed. 1. INTRODUCTION The concepts of community and participation - together with the dogmas of the free dissemination and use of content - represent the pillars of the open source movement. These principles as well apply in the context of the Free and Open Source Software for Geospatial (FOSS4G) development (Brovelli et al., 2017). The open source communities are typically founded by individuals or groups which operates autonomously while recruiting and networking with other communities or community members to contribute organically to the common goal of developing and maintaining software projects of interest (West & O'mahony, 2008). Community software development is generally performed through the use of distributed Version Control Systems - such as the Git (https://git-scm.com) - which enable the management of source code creation, revision, and deployment among large groups of contributors (Alwis & Sillito, 2009). The Git framework is implemented in different web platform allowing contributors to host and manage their development work on cloud-based systems. Nowadays, one of the largest and most popular Git-based web-hosting platforms is GitHub (https://github.com) (Kalliamvakou et al., 2014). This distributed and participatory development approach unfolds a general interest in discovering both geographical extension and degree of activity of the developers’ communities (Thung et al., 2013). Therefore, the primary goal of the presented work is to outline the contribution of individuals and national communities to the development and/or maintenance of the most popular FOSS4G projects. Being the ecosystem of FOSS4G extremely broad and heterogeneous, the most representative projects have been selected considering the ones promoted by the Open Source Geospatial Foundation (OSGeo, https://www.osgeo.org), a notfor-profit organization which mission is to foster the adoption of open geospatial technology worldwide by supporting community-driven software development. This is achieved by exploiting the GitHub platform capabilities, where indeed a number of the OSGeo projects are hosted by means of dedicated source code repositories (Löwe et al., 2017). The paper is structured as follows: Section 2 presents details and limitations of the data collection strategy adopted. Results are presented in Section 3. In Section 4, conclusions and future directions for the work are outlined. 2. DATA COLLECTION STRATEGY Besides the hosting facilities, GitHub provides a number of Application Programming Interfaces (APIs) which allow users to programmatically interact with the platform services. Of primary interest for this work are the capabilities enabled by the GitHub Search API (https://developer.github.com/v3/search). The Search API provides developers with functionalities for inquiring items within GitHub such as users’ profiles and/or project repository features. *Corresponding author This contribution has been peer-reviewed. https://doi.org/10.5194/isprs-archives-XLII-4-W8-171-2018 | © Authors 2018. CC BY 4.0 License. 171 The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-4/W8, 2018 FOSS4G 2018 – Academic Track, 29–31 August 2018, Dar es Salaam, Tanzania A dedicated software application to perform the presented procedure - based on custom JavaScript modules and Python scripts - has been developed by the authors and made available on GitHub (https://github.com/danioxoli/FOSS4G_contributors). A short guide for running the data collection and analysis can be found in the above repository, by easing the reuse of the application. 2.1 Data collection and processing The Search API has been exploited to automatically retrieve from the GitHub platform first, the number of users which have contributed to each of the OSGeo selected projects (see Figure 1) and last the location of each identified user by querying its personal GitHub profile. The location information is among the optional parameters which users are free to specify at the creation of their personal GitHub accounts. Therefore, location information is not always available for the whole identified users. The designed procedure for discovering contributors of each project of interest requires as input a list of GitHub project repositories in CSV format including their URLs. A dedicated JavaScript module is run to perform the collection of contributors’ GitHub usernames for each repository included in the input CSV list by compiling and sending specific Search API requests. The module combines API responses into a single JSON dictionary containing as keys the repository names and as values the list of detected contributors’ GitHub usernames. The output of this first step is passed in input to a (...truncated)


This is a preview of a remote PDF: https://www.int-arch-photogramm-remote-sens-spatial-inf-sci.net/XLII-4-W8/171/2018/isprs-archives-XLII-4-W8-171-2018.pdf
Article home page: https://doaj.org/article/26c1094d07cf4accb78add695c6bcfaa

D. Oxoli, H.-K. Kang, M. A. Brovelli. A SEMI-AUTOMATIC PROCEDURE FOR A DEMOGRAPHIC ANALYSIS OF THE FOSS4G DEVELOPERS’ COMMUNITY, 2018, pp. 171-174, Issue XLII-4-W8, DOI: 10.5194/isprs-archives-XLII-4-W8-171-2018