Modeling post-holiday surge in COVID-19 cases in Pennsylvania counties
PLOS ONE
RESEARCH ARTICLE
Modeling post-holiday surge in COVID-19
cases in Pennsylvania counties
Benny Ren ID*, Wei-Ting Hwang
Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia,
Pennsylvania, United States of America
*
a1111111111
a1111111111
a1111111111
a1111111111
a1111111111
OPEN ACCESS
Citation: Ren B, Hwang W-T (2022) Modeling
post-holiday surge in COVID-19 cases in
Pennsylvania counties. PLoS ONE 17(12):
e0279371. https://doi.org/10.1371/journal.
pone.0279371
Editor: Chong Wang, Iowa State University,
UNITED STATES
Received: April 25, 2022
Accepted: December 6, 2022
Published: December 19, 2022
Peer Review History: PLOS recognizes the
benefits of transparency in the peer review
process; therefore, we enable the publication of
all of the content of peer review and author
responses alongside final, published articles. The
editorial history of this article is available here:
https://doi.org/10.1371/journal.pone.0279371
Copyright: © 2022 Ren, Hwang. This is an open
access article distributed under the terms of the
Creative Commons Attribution License, which
permits unrestricted use, distribution, and
reproduction in any medium, provided the original
author and source are credited.
Data Availability Statement: The data can be
freely accessed from the New York Times GitHub
repository (https://github.com/nytimes/covid-19data).
Abstract
COVID-19 arrived in the United States in early 2020, with cases quickly being reported in
many states including Pennsylvania. Many statistical models have been proposed to understand the trends of the COVID-19 pandemic and factors associated with increasing cases.
While Poisson regression is a natural choice to model case counts, this approach fails to
account for correlation due to spatial locations. Being a contagious disease and often
spreading through community infections, the number of COVID-19 cases are inevitably spatially correlated as locations neighboring counties with a high COVID-19 case count are
more likely to have a high case count. In this analysis, we combine generalized estimating
equations (GEEs) for Poisson regression, a popular method for analyzing correlated data,
with a semivariogram to model daily COVID-19 case counts in 67 Pennsylvania counties
between March 20, 2020 to January 23, 2021 in order to study infection dynamics during the
beginning of the pandemic. We use a semivariogram that describes the spatial correlation
as a function of the distance between two counties as the working correlation. We further
incorporate a zero-inflated model in our spatial GEE to accommodate excess zeros in
reported cases due to logistical challenges associated with disease monitoring. By modeling
time-varying holiday covariates, we estimated the effect of holiday timing on case count.
Our analysis showed that the incidence rate ratio was significantly greater than one, 6-8
days after a holiday suggesting a surge in COVID-19 cases approximately one week after a
holiday.
Introduction
COVID-19, a highly contagious respiratory disease, first appeared in China at the end of 2019
and quickly spread across the world [1]. Evidence suggests mask-wearing, and social distancing are effective strategies in containing COVID-19 [2, 3]. During the beginning of the pandemic, local and state governments quickly moved to implement mask mandates, travel
restrictions and community containment measures (e.g., shelter in place) to mitigate the
spread of the disease [4–6]. However, many Americans still choose to travel and congregate
during the pandemic which is heightened during a federal holiday. Due to lack of adherence to
public health guidance during the holidays, one should expect to see a surge in COVID-19
PLOS ONE | https://doi.org/10.1371/journal.pone.0279371 December 19, 2022
1 / 19
PLOS ONE
Funding: WH is supported by National Institute of
Environmental Health Sciences grant: P30ES013508. The funders had no role in study
design, data collection and analysis, decision to
publish, or preparation of the manuscript. There
was no additional external funding received for this
study.
Competing interests: The authors have declared
that no competing interests exist.
Modeling post-holiday surge in COVID-19 cases in Pennsylvania counties
cases after a holiday. While many reports reaffirm this hypothesis, they are based on anecdotal
evidence such as summary statistics of case counts from moving time windows. There are only
a handful epidemiological studies that estimate the association between holiday timing and the
number of reported COVID-19 cases [7, 8]. For the first year of the pandemic, we hypothesize
that we should see a surge in COVID-19 cases within two weeks after a holiday given that the
incubation period for COVID-19 extends up to 14 days, with a median time of 4-5 days from
exposure to symptoms onset and adding up to an additional 3 days, either by the PCR test or
the instant rapid antigen test, for a positive test to be reported [1, 9–11]. We consider daily
case counts between March 20, 2020 to January 23, 2021 to study early pandemic dynamics
prior to widespread vaccine distribution, COVID-19 variants and at-home testing. In addition,
rigorous disease surveillance and reporting procedures were in place during the beginning of
the pandemic resulting in comprehensive infection data.
Poisson count regression with the population size as an offset is a popular approach to
model count data and incidence rate. Based on the reporting guidelines of the COVID-19
datasets, there are certain dates such as holidays and weekends that could impact whether
cases are being reported [12]. These reporting practices have resulted in excess or structural
zeros in the case count, also known as zero-inflation [13]. We also need to consider spatial correlation among county-level COVID-19 case counts because vector-borne and transmissible
diseases such as COVID-19, exhibit non-negligible spatial correlation as the movement of people can spread the virus to nearby counties [14]. Furthermore, processes that are confounded
by spatially correlated variables are also suited for spatial models; a well-studied example is disease and pollution [15–17]. Thus, we expect to see similar case numbers or trends among
neighboring counties [18].
Mixed models are powerful tools for spatial modeling due to its ability to handle a complex
spatial correlation structure usually represented as a semivariogram or kriging process [19,
20]. Mixed modeling problems have been addressed from a convex optimization and Bayesian
computation perspective [14, 21]. Correlation in spatial epidemiology can also be captured
using conditional autoregressive models [22–24]. Inferential summaries from these models
assume a correctly specified spatial correlation, otherwise a post-estimation robust covariance
can be derived to address misspecified correlation. One such class of robust estimators are the
heteroskedasticity-consistent or sa (...truncated)