Prediction of employment and unemployment rates from Twitter daily rhythms in the US

EPJ Data Science, Jul 2017

By modeling macro-economical indicators using digital traces of human activities on mobile or social networks, we can provide important insights to processes previously assessed via paper-based surveys or polls only. We collected aggregated workday activity timelines of US counties from the normalized number of messages sent in each hour on the online social network Twitter. In this paper, we show how county employment and unemployment statistics are encoded in the daily rhythm of people by decomposing the activity timelines into a linear combination of two dominant patterns. The mixing ratio of these patterns defines a measure for each county, that correlates significantly with employment (\(0.46\pm0.02\)) and unemployment rates (\(-0.34\pm0.02\)). Thus, the two dominant activity patterns can be linked to rhythms signaling presence or lack of regular working hours of individuals. The analysis could provide policy makers a better insight into the processes governing employment, where problems could not only be identified based on the number of officially registered unemployed, but also on the basis of the digital footprints people leave on different platforms.

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:

https://link.springer.com/content/pdf/10.1140%2Fepjds%2Fs13688-017-0112-x.pdf

Prediction of employment and unemployment rates from Twitter daily rhythms in the US

Bokányi et al. EPJ Data Science Prediction of employment and unemployment rates from Twitter daily rhythms in the US Eszter Bokányi Zoltán Lábszki Gábor Vattay By modeling macro-economical indicators using digital traces of human activities on mobile or social networks, we can provide important insights to processes previously assessed via paper-based surveys or polls only. We collected aggregated workday activity timelines of US counties from the normalized number of messages sent in each hour on the online social network Twitter. In this paper, we show how county employment and unemployment statistics are encoded in the daily rhythm of people by decomposing the activity timelines into a linear combination of two dominant patterns. The mixing ratio of these patterns defines a measure for each county, that correlates significantly with employment (0.46 ± 0.02) and unemployment rates (-0.34 ± 0.02). Thus, the two dominant activity patterns can be linked to rhythms signaling presence or lack of regular working hours of individuals. The analysis could provide policy makers a better insight into the processes governing employment, where problems could not only be identified based on the number of officially registered unemployed, but also on the basis of the digital footprints people leave on different platforms. unemployment prediction; Twitter; social media; activity patterns 1 Introduction Until recently, it has been a time-consuming, costly and arduous work to collect and analyze data about individual humans at a large scale. With the advent of the digital era, there is a growing amount of data accessible online that enables the analysis and modeling of human behavior. However, our understanding of these digital data sources and the methods that connect the data to real-world outcomes is still limited. Several aspects on the possible usage of mobile phone records and social media status updates in the estimation of official data, such as census, demographic or land use records have been discussed in recent papers. A promising approach is the analysis of the diurnal rhythm of humans. Due to the  hour periodicity of the Earth’s rotation, we are biologically bound to show daily periodic behavior both at the individual and at the aggregate level. This periodic cycle is governed mainly by internal biochemical processes [–], but the impact of external factors and the environment also leaves its imprint on these daily patterns [, ]. As Säramaki and Moro point out in their paper [], an interesting application is to consider the geospatial aspects of the aggregate level of daily rhythms, as it can provide insight into several different phenomena ranging from the actual land use patterns in a city [–] and on a campus [], to the tracking of anomalous events [, ], or the estimation of population size [], mobility patterns [], poverty [] or crime rates [] in a certain area. Because these aggregate patterns always consist of the superposition of the daily rhythms of individuals, it is worth investigating how the main features of the aggregate level form from superposition. If we can cluster individuals into more or less homogeneously behaving groups based on their daily patterns [], then the aggregate pattern can be understood as the combination of the group patterns, and the group that has more individuals dominates the aggregate daily rhythm. The groups of individuals can form along many demographic and/or socioeconomic factors, of which being employed and going to and from work at regular hours is the most determining one with respect to the daily activity patterns. Thus, decomposing the groups from the aggregate patterns in different geographical regions may give insight into the estimation of employment statistics in that region. Nowcasting or estimating unemployment rates using the digital traces of search engines has already been in the focus of several papers [–]. It has already been shown, that daily activity patterns of individuals can be linked to the regularity of their working hours []. Because the loss of a job has severe psychological consequences [], the effects of a mass layoff can be detected in the unemployment rates and provide a possibility of forecasting macro-economical effects based on observation of several individuals []. In [], there is a strong evidence that aggregated daily activities of certain time intervals of geographical regions can be indicative of unemployment rates. In this paper we obtain  million geolocated messages from the publicly available stream of the social network Twitter from the area of the United States sent between January and October . We aggregate Monday to Friday relative tweeting activity for each hour in each US county to form an average workday activity pattern. We then assume that these activity patterns form a roughly linear subspace of the -hour “timespace”. By finding this linear subspace, that is, by find (...truncated)


This is a preview of a remote PDF: https://link.springer.com/content/pdf/10.1140%2Fepjds%2Fs13688-017-0112-x.pdf

Eszter Bokányi, Zoltán Lábszki, Gábor Vattay. Prediction of employment and unemployment rates from Twitter daily rhythms in the US, EPJ Data Science, 2017, pp. 14, Volume 6, Issue 1, DOI: 10.1140/epjds/s13688-017-0112-x