Modeling post-holiday surge in COVID-19 cases in Pennsylvania counties

PLOS ONE, Dec 2022

COVID-19 arrived in the United States in early 2020, with cases quickly being reported in many states including Pennsylvania. Many statistical models have been proposed to understand the trends of the COVID-19 pandemic and factors associated with increasing cases. While Poisson regression is a natural choice to model case counts, this approach fails to account for correlation due to spatial locations. Being a contagious disease and often spreading through community infections, the number of COVID-19 cases are inevitably spatially correlated as locations neighboring counties with a high COVID-19 case count are more likely to have a high case count. In this analysis, we combine generalized estimating equations (GEEs) for Poisson regression, a popular method for analyzing correlated data, with a semivariogram to model daily COVID-19 case counts in 67 Pennsylvania counties between March 20, 2020 to January 23, 2021 in order to study infection dynamics during the beginning of the pandemic. We use a semivariogram that describes the spatial correlation as a function of the distance between two counties as the working correlation. We further incorporate a zero-inflated model in our spatial GEE to accommodate excess zeros in reported cases due to logistical challenges associated with disease monitoring. By modeling time-varying holiday covariates, we estimated the effect of holiday timing on case count. Our analysis showed that the incidence rate ratio was significantly greater than one, 6-8 days after a holiday suggesting a surge in COVID-19 cases approximately one week after a holiday.

Modeling post-holiday surge in COVID-19 cases in Pennsylvania counties

PLOS ONE RESEARCH ARTICLE Modeling post-holiday surge in COVID-19 cases in Pennsylvania counties Benny Ren ID*, Wei-Ting Hwang Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America * a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 OPEN ACCESS Citation: Ren B, Hwang W-T (2022) Modeling post-holiday surge in COVID-19 cases in Pennsylvania counties. PLoS ONE 17(12): e0279371. https://doi.org/10.1371/journal. pone.0279371 Editor: Chong Wang, Iowa State University, UNITED STATES Received: April 25, 2022 Accepted: December 6, 2022 Published: December 19, 2022 Peer Review History: PLOS recognizes the benefits of transparency in the peer review process; therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. The editorial history of this article is available here: https://doi.org/10.1371/journal.pone.0279371 Copyright: © 2022 Ren, Hwang. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Data Availability Statement: The data can be freely accessed from the New York Times GitHub repository (https://github.com/nytimes/covid-19data). Abstract COVID-19 arrived in the United States in early 2020, with cases quickly being reported in many states including Pennsylvania. Many statistical models have been proposed to understand the trends of the COVID-19 pandemic and factors associated with increasing cases. While Poisson regression is a natural choice to model case counts, this approach fails to account for correlation due to spatial locations. Being a contagious disease and often spreading through community infections, the number of COVID-19 cases are inevitably spatially correlated as locations neighboring counties with a high COVID-19 case count are more likely to have a high case count. In this analysis, we combine generalized estimating equations (GEEs) for Poisson regression, a popular method for analyzing correlated data, with a semivariogram to model daily COVID-19 case counts in 67 Pennsylvania counties between March 20, 2020 to January 23, 2021 in order to study infection dynamics during the beginning of the pandemic. We use a semivariogram that describes the spatial correlation as a function of the distance between two counties as the working correlation. We further incorporate a zero-inflated model in our spatial GEE to accommodate excess zeros in reported cases due to logistical challenges associated with disease monitoring. By modeling time-varying holiday covariates, we estimated the effect of holiday timing on case count. Our analysis showed that the incidence rate ratio was significantly greater than one, 6-8 days after a holiday suggesting a surge in COVID-19 cases approximately one week after a holiday. Introduction COVID-19, a highly contagious respiratory disease, first appeared in China at the end of 2019 and quickly spread across the world [1]. Evidence suggests mask-wearing, and social distancing are effective strategies in containing COVID-19 [2, 3]. During the beginning of the pandemic, local and state governments quickly moved to implement mask mandates, travel restrictions and community containment measures (e.g., shelter in place) to mitigate the spread of the disease [4–6]. However, many Americans still choose to travel and congregate during the pandemic which is heightened during a federal holiday. Due to lack of adherence to public health guidance during the holidays, one should expect to see a surge in COVID-19 PLOS ONE | https://doi.org/10.1371/journal.pone.0279371 December 19, 2022 1 / 19 PLOS ONE Funding: WH is supported by National Institute of Environmental Health Sciences grant: P30ES013508. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. There was no additional external funding received for this study. Competing interests: The authors have declared that no competing interests exist. Modeling post-holiday surge in COVID-19 cases in Pennsylvania counties cases after a holiday. While many reports reaffirm this hypothesis, they are based on anecdotal evidence such as summary statistics of case counts from moving time windows. There are only a handful epidemiological studies that estimate the association between holiday timing and the number of reported COVID-19 cases [7, 8]. For the first year of the pandemic, we hypothesize that we should see a surge in COVID-19 cases within two weeks after a holiday given that the incubation period for COVID-19 extends up to 14 days, with a median time of 4-5 days from exposure to symptoms onset and adding up to an additional 3 days, either by the PCR test or the instant rapid antigen test, for a positive test to be reported [1, 9–11]. We consider daily case counts between March 20, 2020 to January 23, 2021 to study early pandemic dynamics prior to widespread vaccine distribution, COVID-19 variants and at-home testing. In addition, rigorous disease surveillance and reporting procedures were in place during the beginning of the pandemic resulting in comprehensive infection data. Poisson count regression with the population size as an offset is a popular approach to model count data and incidence rate. Based on the reporting guidelines of the COVID-19 datasets, there are certain dates such as holidays and weekends that could impact whether cases are being reported [12]. These reporting practices have resulted in excess or structural zeros in the case count, also known as zero-inflation [13]. We also need to consider spatial correlation among county-level COVID-19 case counts because vector-borne and transmissible diseases such as COVID-19, exhibit non-negligible spatial correlation as the movement of people can spread the virus to nearby counties [14]. Furthermore, processes that are confounded by spatially correlated variables are also suited for spatial models; a well-studied example is disease and pollution [15–17]. Thus, we expect to see similar case numbers or trends among neighboring counties [18]. Mixed models are powerful tools for spatial modeling due to its ability to handle a complex spatial correlation structure usually represented as a semivariogram or kriging process [19, 20]. Mixed modeling problems have been addressed from a convex optimization and Bayesian computation perspective [14, 21]. Correlation in spatial epidemiology can also be captured using conditional autoregressive models [22–24]. Inferential summaries from these models assume a correctly specified spatial correlation, otherwise a post-estimation robust covariance can be derived to address misspecified correlation. One such class of robust estimators are the heteroskedasticity-consistent or sa (...truncated)


This is a preview of a remote PDF: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0279371&type=printable
Article home page: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0279371

Benny Ren, Wei-Ting Hwang. Modeling post-holiday surge in COVID-19 cases in Pennsylvania counties, PLOS ONE, 2022, Volume 17, Issue 12, DOI: 10.1371/journal.pone.0279371