Qualitative coding, or content analysis, is more than just labeling text: it is a reflexive interpretive practice that shapes research questions, refines theoretical insights, and illuminates subtle social dynamics. As large language models (LLMs) become increasingly adept at nuanced language tasks, questions arise about whether—and how—they can assist in large-scale coding...
Public procurement, a critical but often overlooked aspect of governance, plays a pivotal role in steering the acquisition of goods, services and the commissioning of public works. Our study, analyzing over one million public procurement contracts from the Portuguese public administration, applies network science to unravel the complexities of this market. We uncover a market...
This paper introduces and tests an unsupervised method for detecting novel coordinated inauthentic information operations (CIOs) in realistic settings. This method uses Bayesian inference to identify groups of accounts that share similar account-level characteristics and target similar narratives. We solve the inferential problem using amortized variational inference, allowing us...
Mobile phone data have played a key role in quantifying human mobility during the COVID-19 pandemic. Existing studies on mobility patterns have primarily focused on regional aggregates in high-income countries, obfuscating the accentuated impact of the pandemic on the most vulnerable populations. Leveraging geolocation data from mobile-phone users and population census for 6...
Music playlist creation is a crucial, yet not fully explored task in music data mining and music information retrieval. Previous studies have largely focused on investigating diversity, popularity, and serendipity of tracks in human- or machine-generated playlists. However, the concept of playlist coherence – vaguely defined as smooth transitions between tracks – remains poorly...
Misinformation and disinformation are growing threats in the digital age, affecting people across languages and borders. However, no research has investigated the prevalence of multilingual misinformation and quantified the extent to which misinformation diffuses across languages. This paper investigates the prevalence and dynamics of multilingual misinformation through an...
Value chain data is crucial for navigating economic disruptions. Yet, despite its importance, we lack publicly available product-level value chain datasets, since resources such as the “World Input-Output Database”, “Inter-Country Input-Output Tables”, “EXIOBASE”, and “EORA”, lack information about products (e.g. Radio Receivers, Telephones, Electrical Capacitors, LCDs, etc.) and...
Many actors use strategic communications to impact media debates through targeted messages and campaigns, but the scale and diversity of online media content make it difficult to evaluate the impact of a particular message or campaign. In this paper, we present a new technique that leverages semantic similarity of actor messages and media content to quantify the change in media...
This paper employs Unfolded Adjacency Spectral Embedding (UASE) to investigate the temporal evolution of economic relationships between locations in Great Britain. We utilise timestamped, geolocated website hyperlinks data between archived, commercial websites in Britain, which are aggregated to create a set of directed, weighted networks of hyperlink connections between Local...
Social bots remain a major vector for spreading disinformation on social media and a menace to the public. Despite the progress made in developing multiple sophisticated social bot detection algorithms and tools, bot detection remains a challenging, unsolved problem that is fraught with uncertainty due to the heterogeneity of bot behaviors, training data, and detection algorithms...
Changes in individual and institutional financial behavior leading to shifts in liquidity flows often depend on events reflected in news. However, the task of establishing relationship between financial behavior and news remains challenging and understudied. We propose a news-based feature generation approach that allows accounting for news events in liquidity flow time-series...
Despite decades-long efforts to increase diversity, underrepresented social groups remain small minorities in many fields. Here, we ask whether disparities in global recognition exist for traditionally underrepresented demographic groups. We investigate whether a notable person’s demographic attributes are associated with their global recognition, considering both the global...
Credibility signals represent a wide range of heuristics typically used by journalists and fact-checkers to assess the veracity of online content. Automating the extraction of credibility signals presents significant challenges due to the necessity of training high-accuracy, signal-specific extractors, coupled with the lack of sufficiently large annotated datasets. This paper...
Understanding criminal motives is crucial for analyzing criminal psychology and predicting judicial outcomes. Traditional methods for crime motive analysis are heavily based on statistical techniques, requiring specialized knowledge and substantial human resources. With the increasing availability of judicial data, such as legal documents, machine learning approaches hold great...
Understanding stock market instability is a key question in financial management as practitioners seek to forecast breakdowns in long-run asset co-movement patterns which expose portfolios to rapid and devastating collapses in value. These disruptions are linked to changes in the structure of market wide stock correlations which increase the risk of high volatility shocks. The...
Temporal networks are commonly used to model real-life phenomena. When these phenomena represent interactions and are captured at a fine-grained temporal resolution, they are modeled as link streams. Community detection is an essential network analysis task. Although many methods exist for static networks, and some methods have been developed for temporal networks represented as...
The transfer velocity of money is a macroeconomic quantity that measures the frequency of exchanges in an economy. For cryptoassets it can be exactly measured adopting a new approach, MicroVelocity. In this study we apply the framework to Ether, the native cryptocurrency of the Ethereum blockchain, to investigate velocity and its top contributors and how they can be characterised...
The Russo-Ukrainian War represents a significant contemporary conflict between two global powers, yet the dynamics of human-bot engagement during this conflict, particularly on social media platforms like Twitter and Reddit, remain underexplored. Existing literature has not adequately addressed how bots and humans interact differently across languages within this geopolitical...
We typically think of the demand volume for a business in a city as a function of basic characteristics, such as the type of business, the quality of the product or service offered and its pricing. In addition, factors related to the urban environment, such as population density and accessibility are also crucial and have been considered in the literature. However, these...
The dominance of online social media data as a source for large-scale social network studies has recently been challenged by networks constructed from state-curated register data. In this paper focused on the cross-comparison of the network structures, we investigate the similarities and differences of the Dutch online social network (OSN) Hyves and a register-based social...
Summarizing historical pandemic control experience can help the government better cope with the impact of uncertain public health events on taxi industry. This paper presents a summary of the relationship between various pandemic control measures and taxi system from the perspective of travel resilience. Additionally, we investigate the effectiveness of passenger subsidy schemes...
Networks where each node has one or more associated numerical values are common in applications. This work studies how summary statistics used for the analysis of spatial data can be applied to non-spatial networks for the purposes of exploratory data analysis. We focus primarily on Moran-type statistics and discuss measures of global autocorrelation, local autocorrelation and...
Sample-based music—characterized by the adoption of extant audio fragments (sampling) in its creation process—plays an essential role in contemporary popular music, fostering inter-generational connections between the creators that have resulted in a rich and diverse sonic landscape. The selection, manipulation, and adoption of samples heavily impact the genre, mood, texture, and...
Ending poverty in all its forms everywhere remains the number one Sustainable Development Goal of the United Nations 2030 Agenda. Governments face challenges in measuring socioeconomic status with fine spatial resolution because traditional data collection methods, such as censuses and surveys, are time-consuming, labor-intensive, performed at long intervals, and cover only a...