Skip to main content
SearchLoginLogin or Signup

Curating a COVID-19 Data Repository and Forecasting County-Level Death Counts in the United States

Published onFeb 08, 2021
Curating a COVID-19 Data Repository and Forecasting County-Level Death Counts in the United States

You're viewing an older Release (#1) of this Pub.

  • This Release (#1) was created on Nov 03, 2020 ()
  • The latest Release (#6) was created on Aug 17, 2022 ().


As the COVID-19 outbreak evolves, accurate forecasting continues to play an extremely important role in informing policy decisions. In this paper, we present our continuous curation of a large data repository containing COVID-19 information from a range of sources. We use this data to develop predictions and corresponding prediction intervals for the short-term trajectory of COVID-19 cumulative death counts at the county-level in the United States up to two weeks ahead. Using data from January 23 to June 20, 2020, we develop and combine multiple forecasts using ensembling techniques, resulting in an ensemble we refer to as Combined Linear and Exponential Predictors (CLEP). Our individual predictors include county-specific exponential and linear predictors, a shared exponential predictor that pools data together across counties, an expanded shared exponential predictor that uses data from neighboring counties, and a demographics-based shared exponential predictor. We use prediction errors from the past five days to assess the uncertainty of our death predictions, resulting in generally-applicable prediction intervals, Maximum (absolute) Error Prediction Intervals (MEPI). MEPI achieves a coverage rate of more than 94% when averaged across counties for predicting cumulative recorded death counts two weeks in the future. Our forecasts are currently being used by the nonprofit organization, Response4Life, to determine the medical supply need for individual hospitals and have directly contributed to the distribution of medical supplies across the country. We hope that our forecasts and data repository at can help guide necessary county-specific decision-making and help counties prepare for their continued fight against COVID-19.

Just Accepted - Preview

11/3/20: To preview this content, click below for the Just Accepted version of the article. This peer-reviewed version has been accepted for its content and is currently being copyedited to conform with HDSR’s style and formatting requirements.

A Supplement to this Pub
Scott Coston:

Is the published data necessary & sufficient to provide deeper insights?

Much of the COVID data is problematic:

COVID testing only occurs when someone who is symptomatic is tested. Random testing of the population would provide more insight since many folks are asymptomatic. It would also be useful for identifying characteristics of those who have a low incidence of infection (if such a group exists)

Medical community is not testing for Delta Variant. Which means there are a LOT of assumptions being made and reliable data to support delta variant does not appear to exist.

NOTE: understanding the following topics requires data that is generated by the medical and scientific communities. I need to re-read the paper to see if these sources are included and what data is collected. It seems likely that mathematical analysis of data that pertains to mutations and natural vs vaccinated immunity would be instrumental for greater understanding, which is essential as the pandemic continues.

Exploring mutations with data

“The reason (RNA viruses) are thought to be so successful is because their polymerases make mistakes,” Denison said. “They lack the ability to correct mistakes, so they generate mutant swarms of viruses that are ready for adaptation in different environments.”

The leaky vaccines being deployed may compound the mutation problem. With the Delta variant, nasopharyngeal viral replication is prodigious in both vaccinated and unvaccinated hosts. Viral replication is estimated at 1000 times the original strain. Is it possible to ascertain if vaccinated hosts provide an ideal environment for the development of vaccine resistant mutations. *If* this phenomenon exists, is it conceptually similar to the development of antibiotic resistant bacteria?

Natural vs Vaccine immunity

Do people who have recovered from COVID have a more robust immune response to Delta since they have IGA from the infection? IGA concentrate on the nasal mucosa which may be effective at reducing the viral load. Vaccinated people do *not* have this IGA component. Understanding this phenomenon better would be very beneficial. Having the data to form a hypothesis might be invaluable for treatment protocols.