Effective Learning During COVID-19: Multilevel Covariates Matching and Propensity Score Matching
Annals of Data Science
https://doi.org/10.1007/s40745-022-00392-x
Effective Learning During COVID-19: Multilevel Covariates
Matching and Propensity Score Matching
Siying Guo1 · Jianxuan Liu2
· Qiu Wang3
Received: 13 January 2021 / Revised: 6 March 2022 / Accepted: 12 March 2022
© The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2022
Abstract
In large-scale observational data with a hierarchical structure, both clusters and interventions often have more than two levels. Popular methods in the binary treatment
literature do not naturally extend to the hierarchical multilevel treatment case. For
example, most K-12 and universities have moved to an unprecedented hybrid learning
module during the COVID-19 pandemic where learning modes include hybrid and
fully remote learning, while students were clustered within a class and school region.
It is challenging to evaluate the effectiveness of the learning outcomes of the multilevel treatments in a hierarchically data structured. In this paper, we study a covariates
matching method and develop a generalized propensity score matching method to
reduce the bias of estimation in the intervention effect. We also propose simple algorithms to assess the covariates balance for each approach. We examine the finite sample
performance of the methods via simulation studies and apply the proposed methods
to analyze the effectiveness of learning modes during the COVID-19 pandemic.
Keywords COVID-19 · Generalized propensity score · Matching · Multilevel hybrid
learning · Potential outcome
B Jianxuan Liu
Siying Guo
Qiu Wang
1
School of Criminal Justice and Public Administration, Kean University, Union, USA
2
Department of Mathematics, Syracuse University, Syracuse, USA
3
School of Education, Syracuse University, Syracuse, USA
123
Annals of Data Science
1 Introduction
The global impact of COVID-19 [1] has led to social and economic crises [2], further
widening inequalities and exacerbating global poverty. In response to the COVID19 pandemic, local, state, and federal agencies have implemented social distancing or
lockdown measures designed to slow the spread of the disease [3]. With the implementation of these measures, our daily routines have been changed, which has profoundly
impacted the academic learning as well as psychological and physical health of K-12
and college students. Students with special needs, minorities, and poor students experienced additional negative impacts [4]. The UN 2020 report [5] showed that since the
outbreak of COVID-19 began, there were more than 1.52 billion children and youth,
87% of the global population, unable to learn in traditional classroom settings. Most
K-12 and colleges have moved to online or a new hybrid learning module to maintain
social distancing and slow the spread of disease. In effect, the pandemic has been an
extraordinarily challenging time for teachers and students, especially the transition to
new teaching and learning modes. However, the pandemic has created an opportunity
to rethink how we educate and to improve pedagogies to help students succeed. In the
era of big data, numerous works has been conducted to learn knowledge from large
scale of data, for example, [6–10] and among other. In terms of educational policy, it
is fundamentally important to evaluate the effectiveness of the unprecedented learning modules during the pandemic across a wide range of school clusters and student
backgrounds. In order to understand the complexities these complexities, we engaged
analysis of hierarchically structured data from observational studies.
In large-scale observational data with a hierarchical structure, both clusters and
interventions often have more than two levels [11]. The larger units are clusters [12],
groups [13, 14], communities [15], or schools [16–19]. It is referred to the clusterrandomized trails (CRT) design [12, 14]. CRT design ensures that each cluster consists
of multiple comparable individuals [20] to ensure that their baseline characteristics do
not confound with corresponding outcomes. For example, in educational studies using
cluster design, the cluster sizes varied from 5 in ECLS (Early Childhood Longitudinal
Study) to 60 in LSAY(Longitudinal Study of American Youth) and the mean is about
13 [21].
When clusters are assigned to educational interventions, unbalance in baseline
covariates among groups often occurs [22], which results in selection bias. Selection
bias is ubiquitous in observational studies when the “golden rule” of randomization
fails [23], and will not necessary lead to causal estimate of the intervention [24].
However, it is not always plausible to conduct randomized trials due to cost related
concerns and more importantly ethical issues. When large-scale hierarchically structured data from observational studies are employed, it is crucial to remove selection
bias [16, 25] so that the data can be viewed as if they were from randomized studies.
Similar with all observational studies, the baseline characteristics among individuals
in each hierarchical cluster are not guaranteed to be comparable, thus confounding the
outcome. To establish the causal effect of the intervention, a large body of literature
has been developed to evaluate the average causal effect consistently. The methods can
be classified as matching [26–29], stratification [26, 30], covariance adjustment [26],
123
Annals of Data Science
inverse probability weighting [31–34], and augmented inverse probability weighting
which provides double protection to model misspecification [35–38].
To estimate causal effect using observational data, it is preferable to resemble a randomized experiment as closely as possible through balancing the covariates among
different treatment groups. It is natural to match on covariates so that the observed
samples can be viewed as if they were from a randomized experiment. It is not always
possible to obtain matched covariates when there are larger numbers of clusters while
the size in each cluster is not large due to the sparsity in the hierarchically structured population. [26] made an important advancement with the introduction of the
propensity score to circumvent the problem. The propensity score is defined as the
conditional probability of an treatment given the observed covariates. It is also a balancing score in the sense that conditional on the propensity score, the distributions of
the measured covariates are the same between treatment groups. In order to evaluate
the actual intervention effects of instructional or educational methods, a growing number of educational studies have employed propensity score as a method for reducing
bias known to plague observational studies and increasing the balance between treatment and comparison groups [39–43]. However, these studies focused on either binary
treatment options or several treatments in non-hierarchically structured populations.
Further, binary (...truncated)