SELECT ANONYMOUS REVIEWER REPORT and AUTHOR RESPONSE TO REVIEWERS A Spatiotemporal Epidemiological Prediction Model to Inform County- Level COVID-19 Risk in the United States

As the COVID-19 pandemic continues worsening in the United States, it is of critical importance to develop a health information system that provides timely risk evaluation and prediction of the COVID-19 infection in communities. We propose a spatiotemporal epidemiological forecast model that combines a spatial cellular automata (CA) with a temporal extended susceptible-antibody-infectious-removed (eSAIR) model under time-varying state-specific control measures. This new toolbox enables the projection of the county-level COVID-19 prevalence over 3109 counties in the continental United States, including t-day-ahead risk forecast and the risk related to a travel route. In comparison to the existing temporal risk prediction models, the proposed CA-eSAIR model informs the projected county-level risk to governments and residents of the local coronavirus spread patterns and the associated personal risks at specific geolocations. Such high-resolution risk projection is useful for decision-making on business reopening and resource allocation for COVID-19 tests.

In regard to the under-reporting of deaths or fatalities from the COVID-19, this is not only a public health surveillance issue but also a fundamental issue of ascertainment in epidemiology. In fact, many ordinary deaths cannot be attributed to a defining cause with high certainty. This problem becomes even more complicated in the case of deaths from the coronavirus. Since the medical diagnosis of "COVID-19 disease symptoms" is not fully clinically well-defined, the number of deaths is always a questionable figure. In addition, there are some anecdotally reported cases of at-home deaths potentially linked to the coronavirus, which are however believed to be rather isolated incidences in the US and should not really affect the analysis results much.
In summary, to address the issue of data quality linked particularly to under-reporting, we expand our model by adding a compartment of antibody, and the subsequent analysis is done with the utility of serological survey results from COVID-19 tests in state NY. We did not deal with the issue of death ascertainment and certification, if or not being attributed to the coronavirus. Being a pathological problem, the latter is somewhat beyond the scope of this paper.

Editor's comment: Given the time constraints, what are the corners you cut and what are the
likely consequences of such cutting (e.g., under-predicting, under

assessing uncertainties)?
Our response: We appreciate the flexibility very much allowed for the revision. Despite the significant time constraints, we are still committed to addressing all major critiques in the review.
Here is the list of improvements that we have made in this round of revisions.
• We expanded the classical SIR model to a new eSAIR with an addition of a new compartment of antibody. This eSAIR model has never been studied in the literature of infectious diseases before and is a brand-new contribution to address the unique situation of the COVID-19 pandemic in the US. This extension with an antibody compartment is viable because of the coronavirus test surveys, and more of such surveys may come in the future. The novelty is clear: it is the first stochastic infectious disease model that integrates public health survey data into the modeling of infectious disease dynamics and prediction.
• We expanded the SIR model to incorporate social distancing as a transmission rate modifier. The novelty in this expansion lies in the use of time-varying efficacy of social distancing evaluated by real-time data captured using mobile devices. Thanks to several research institutes in the US that provide timely and data-driven estimates for various social distancing policies across different states.
• We developed an improved procedure to determine inter-county mobility and connectivity. Again, we used some data relevant to personal mobility such as the percent of people having out-country trips and the nearest neighboring airport, instead of only using a simple geo-distance in the previous version of the paper.
• We addressed the uncertainty by propagating the uncertainty of the estimated model parameters into the prediction. Based on the MCMC method, estimation uncertainties, including those associated with the estimated prevalence and estimated proportion of people with antibody, can be easily assessed. We demonstrated how such uncertainty may lead to uncertainties in the prediction.
As seen, we have tried our best to address all major concerns in the review and tried to cut our corners as less as possible. Nevertheless, we did take some approximations and assumptions in the proposed method due to data limitations and time constraints. Here is a list of things that we could do better.
• In our model we assume that an infected person who recovers from his/her infection is immune to the coronavirus within the period of time considered for risk prediction.
This assumption is very likely to be true but has not been justified yet.
• Our prediction presented in this paper is based on the limited serological survey data from state New York, which can be improved greatly in the near future when more and more states conduct similar surveys for the antibody against the coronavirus.
Nevertheless, our CA-eSAIR model provides a toolbox to incorporate such important results.
• In addition to the self-immunization rate ( ), there are two other coefficients that need to be specified, including a temporal transmission rate modifier ( ) and a spatial inter-county connectivity coefficient ′ ( ) . These two coefficients are specified by the findings from some other research institutes through mobile device data. Much room exists for future improvements on these two coefficients.
in the model selections.
Our response: We agree with you that using limited data from public surveillance databases to learn a complex spatiotemporal dynamic system of the COVID-19 infection is subject to much uncertainty, and addressing such uncertainty is of great importance. In this proposed system, uncertainty comes from two major sources.
• The first kind is the specification of the transmission rate modifier ( ) due to social distancing, self-immunization rate ( ) based on the limited antibody test survey results, and inter-county connectivity coefficient ′ ( ) estimated by mobile devices data. These quantities are not estimated but rather specified from external sources of information, which are subject to some uncertainty. In general, the model selection on these three functions are difficult due to a lack of adequate data. We do consider a tuning step in the form of inter-county connectivity function by minimizing a one-step ahead prediction error.
• The second kind is the MCMC estimation uncertainty for the model parameters in the proposed eSAIR model. Our response: We appreciate very much for your advice about the potential limitations and caveats in our methodology and findings. In the revision, we have paid extra caution in our conclusions and discussion. Essentially, this paper is to provide a spatiotemporal prediction model, an analytic toolbox that may be used by practitioners to perform their own analyses. In addition, we explicitly state assumptions and specifications of model components throughout the paper.
We are very thankful to your insightful comments on our work, which have helped us improve the manuscript. Below we provide our point-to-point responses to each of your comments, beginning with yours in italics.

AE's comment:
The issue of unreported cases seems to throw a wrench in the whole SIR construction.
Our response: We appreciate your critique. Indeed, we fully agree with the reviewers that both the numbers of infections and deaths are underreported in publicly available databases due to the limited capacities of data collection. In the revision we have tried to tackle this problem by accounting for self-immunization via the development of personal antibodies to the coronavirus.
One important insight on the under-reporting of infected cases is that the current health surveillance system has difficulty in capturing asymptomatic individuals with light symptoms and/or providing sufficient resources for the COVID-19 diagnostic tests (RT-PCT). It is fortunate that after the submission of our manuscript, several states in the US released the results of serological test surveys for herd immunity, including NY, CA and MS. Although these serological surveys are of small scale and are limited in some isolated counties, these results provide useful information for us to correct the under-reporting of infections. Thus, in this revision we introduce a new compartment of antibody to extend the existing eSIR model, termed as eSAIR model, which is used to estimate the state-level probabilities of an individual susceptible, self-immunized, infected (prevalence), and removed on the date of the last observed data available; these serve as the initial values being utilized by our proposed CA-eSAIR model to project county-level risk. If there are no constraints of time and sources, extensive surveys on the proportion of the population with COVID-19 antibody at county-level would help our CA-eSAIR make better community-level risk prediction. In this revision, due to these limitations, given the fact that NY has had an antibody test survey with the highest coverage so far, we focus on the county-level prediction in great detail in New York state as an illustration for the application of our CA-eSAIR model.
In regard to the under-reporting of deaths from COVID-19, this is not only a public health surveillance issue but also a fundamental ascertainment issue in epidemiology. In fact, many ordinary deaths cannot be attributed to a defining cause with high certainty. This problem becomes even more complicated in the case of deaths from coronavirus. Since the medical diagnosis of "COVID-19 disease" is not clinically well-defined, the number of deaths is always a questionable figure. In addition, there are some anecdotally reported cases of at-home deaths potentially linked to the coronavirus, which are however believed to be rather isolated incidences in the US and should not really affect the analysis results much.
In summary, to address the issue of data quality linked particularly to under-reporting, we expand our model by adding a compartment of antibody, and the subsequent analysis is done with the utility of survey results from COVID-19 tests in NY. We did not deal with the issue of death ascertainment and certification, if or not being attributed to the coronavirus. Being a pathological problem, the latter is somewhat beyond the scope of this paper.

AE's comment:
Control measures change the regime of the disease propagation, but this does not seem to be taken into account. Our response: Thanks for pointing out another important issue in our prediction model. Indeed, it is difficult, in general, to determine an objective connectivity coefficient ′ ( ) due to our limited knowledge and limited data source available. This problem itself seems to define a research area, and we would welcome any new ideas and approaches to improve the specification of this inter-county connectivity function. Being said, in this revision, we have tried our best capacity to improve the previous geo-distance-based connectivity. This improvement lies in the inclusion of two additional factors in the function, including 1) the percentage decrease in encounters density compared to national baseline, which is obtained from the social distancing scoreboard of the Unacast company (https://www.unacast.com/covid19/social-distancing-scoreboard) based on human mobility data, and 2) the information of airports in the US (e.g. annual enplanements) and their accessibility to each county. In addition, we consider a factor to tune the scale of the travel distance ( , ′) by minimizing the one-day ahead prediction error. In the future, we hope to collaborate with experts in this field for a further improvement in defining the connectivity coefficient ′ ( ). Our response: Indeed, the problem of estimating the under-reporting rate is difficult. In the revision we have tried to tackle this problem by accounting for self-immunization via the development of personal antibody to the coronavirus under the assumption that the substantial majority of infections with no hospitalization (and thus not recorded in the database) are selfrecovered with antibodies. Along with this line, we extend our model by adding a new compartment of antibody, termed as eSAIR model that is used for the prediction.

AE's comment:
One important insight on the under-reporting of infected cases is that the current health surveillance system has difficulty in capturing asymptomatic individuals with light symptoms and/or providing sufficient resources for the COVID-19 diagnostic tests. It is fortunate that after the submission of our manuscript, several states in the US released the results of serological surveys for herd immunity, including NY, CA and MS. Although these surveys are of small scale and are limited in some isolated counties, these survey results provide useful information for us to correct the under-reporting of infections. Thus, in this revision we introduce a new compartment of antibody to extend the existing eSIR model, termed as eSAIR model, which is used to estimate the state-level probabilities of an individual susceptible, self-immunized, infected (prevalence), and removed on the date of the last observed data available, and these estimates serve as the initial values being utilized by our proposed CA-eSAIR model to project county-level risk. If there are no constraints of time and sources, extensive surveys on the proportion of the population with COVID-19 antibody at county-level would help our CA-eSAIR make better community-level risk prediction. In this revision, due to these limitations, given the fact that NY has had an antibody test survey with the highest coverage so far, we focus on the county-level prediction in great detail in New York state as an illustration for the application of our CA-eSAIR model.
In regard to the under-reporting of deaths from COVID-19, this is not only a public health surveillance issue but also a fundamental issue in epidemiology. In fact, many ordinary deaths cannot be attributed to a defining cause with high certainty. This problem becomes even more complicated in the case of deaths from coronavirus. Since the medical diagnosis of "COVID-19 disease" is not clinically well-defined, the number of deaths is always a questionable figure. In addition, there are some anecdotally reported cases of at-home deaths potentially linked to the coronavirus, which are however believed to be rather isolated incidences in the US and should not really affect the analysis results much.
In summary, to address the issue of data quality linked particularly to under-reporting, we expand our model by adding a compartment of antibody, and the subsequent analysis is done with the utility of survey results from COVID-19 tests in NY. We did not deal with the issue of death certification, if or not being attributed to the coronavirus. Being a pathological problem, the latter is somewhat beyond the scope of this paper.

Referee's comment: The model makes no provision for the introduction of control measures.
These would change the basic parameter (and hence 0 Our response: Thanks for pointing out this issue. In theory, it is possible that , or become larger than one and becomes smaller than zero, which may occur especially in the case of a very long-term risk prediction, say, a half-year ahead prediction, since we have a substantially large number of terms in the summation. Given the fact that the pandemic of COVID-19 evolves so fast with constantly varying regimes of the disease, we would not like to consider risk prediction longer than one month. So, practically this technical issue is very unlikely to occur. Nevertheless, to make it technically correct, we confine , , , within [0,1] in our software package.

Referee's comment: Provide a justification for equations (4) & (5). Is there an interpretation for
this risk score or is it just a heuristic measure?
Our response: We consider an intuitive way to define the risk of infection as a cumulative chance during the prediction period as P(infection at or before day t) = P(infection at day 1) + P(infection at day 2 | not infection at day 1) + …+ P(infection at day t | no infection before day t-1). We have made a remark in the paper about the intuition behind this definition.

Referee's comment:
For section 3.1 that related to my second concern, there is a gap in the literature when it comes to the estimation of 0 after controls were introduced. If you could obtain and report such an estimate, I think it would be very useful.
Our response: As pointed out above, in the proposed eSAIR model we have introduced a transmission rate modifier function ( ) to adjust the transmission rate using the mobile device data. Therefore, the estimate of the basic reproduction number 0 is obtained by adjusting control measures/human interventions. A detailed introduction of the new eSAIR model with a timevarying transmission rate modifier ( ) is given in Section 2.1.
We would like to express our gratitude for your review and careful reading of our paper. Your insightful comments in the review report helped us improve greatly on our manuscript. Below are the listed point-to-point responses to each of your comments presented in italics.

Referee's comment:
If I have understood everything right, the CA-SIR model seems to incorporate transmission from neighboring counties in an entirely arbitrary way, which is not driven by data, and even worse, doesn't even seem to be tunable. At its most basic, we end up with: Our response: Thanks for your insightful critique that arises several important points, which will be addressed one by one below.
• The CA-eSAIR model (formerly CA-SIR model) is a model for risk prediction based on the results obtained from fitting the state-level eSAIR model. In principle, fitting the eSAIR model can be performed with county-level data if such data are available in good quality.
In reality, the county-level data in most counties are sparse and less reliable (e.g. an infectious person living in a county died in a hospital that is located in another county).
Thus, it is more robust to run an epidemiological model at a relatively large population like in a state where the preventive measures are supposed to be homogeneous across the counties within the specific state.
• We totally agree with you that there is a certain level of subjectivity in the specification of the inter-county connectivity coefficients ′ ( ), which requires knowledge beyond the domain of public health. However, a good specification of ′ ( ) is needed in the countylevel prediction using the spatiotemporal CA-eSAIR model. Indisputably, the better quantification of inter-county connectivity, the better prediction of county-level infection risk. There may not exist the best quantification for this coefficient. But some better versions exist. One of the major focal areas for improvement in this revision is to improve the definition of ′ ( ) in order to reduce the subjectivity (or arbitrariness). • Our improvement is made in the following three domains: 1) We borrow information of inter-county mobility derived from the score of the percentage decrease in encounters density compared to the national baseline, obtained from the social distancing scoreboard of the Unacast company (https://www.unacast.com/covid19/social-distancing-scoreboard) based on human mobility data; 2) we incorporate the location and the annual enplanements of over 300 commercial airports in the continental US into the definition of inter-county range, which combines both the information of geo-distance and air-distance; and 3) we add a tuning parameter to obtain an optimal scaling of the inter-county range by minimizing the one-day ahead prediction error. Therefore, we have greatly reduced the arbitrariness on the formation of the inter-county connectivity. We also comment that a more desirable data-driven specification of ′ ( ) itself presents a big research topic, which is beyond the scope of this paper.

Referee's comment:
The standard of the writing is really rather poor. I started correcting this, but I realized they were just too many errors -I think this manuscript really needs to go to a copy editor before it is published.
Our response: We appreciate this suggestion. We would appreciate the help of a copy editor to improve the writing.

Referee's comment: In the introduction the authors state that COVID-19 is one of the most lethal
communicable infectious diseases-this may be true in terms of a fully susceptible population, but I don't think it's true on an individual level. Our response: You raised an interesting question.
• First, yes, we use a Bayesian framework for the implementation of MCMC to fit the eSAIR model. One could run MCMC in the spatiotemporal CA-eSAIR model for a total of 3109 counties, but we did not do it because, as in our response to your first point, the countylevel data in most of the counties are sparse and less reliable. Thus, it is more robust to run an epidemiological model at a relatively large population utilizing state-level data.
• Second, we agree with you that the quantification of prediction uncertainty is of great importance, which has been added in the revision. In the MCMC framework, we can calculate the 95% credible intervals for these parameters from 200,000 MCMC draws. In principle, when there are no time constraints, we can project 200,000 risk scores using the CA-eSAIR model, each from one MCMC draw from the eSAIR model. Consequently, we can assess the prediction uncertainty. Since this prediction is done at a county-level where each projection involves 3109 counties, multiplying a factor of 200,000 will lead to an extremely high computational cost. To simplify this calculation, we demonstrate the propagation of estimation uncertainty into risk prediction in the way that the 95% credible intervals of , , , carry over those uncertainties for the projected risk. This is just a simple solution that manifests the uncertainty in the risk projection.

Is it the Euclidean distance between county centroids? The minimum distance between counties?
Our response: The geo-distance we calculated between county and ′ is the geodesic distance, which is the shortest distance between two points on the surface of an ellipsoidal model of the earth. The default algorithm used for the calculation is given by Karney (2013) Our response: We totally agree with you on the exercise of caution on reporting our prediction results, which are indeed obtained under various assumptions, in addition to the issue of data quality. Two major limitations that are discussed in the discussion section are given as follows.
We did take some approximations and assumptions in the proposed method due to the limited data and time constraints. Here is a list of things that we could do better.
• In our model, we assume that an infected person who recovers from his/her infection is immune to the coronavirus within the period of time considered for risk prediction.
This assumption is very likely to be true but has not been justified yet. Our prediction presented in this paper is based on the limited serological survey data from state New York, which can be improved greatly in the near future when more and more states conduct similar surveys for the coronavirus tests. Nevertheless, our CA-eSAIR model provides a toolbox to incorporate such important results.
• In addition to the self-immunization rate ( ), there are two other coefficients that need to be specified, including a temporal transmission rate modifier ( ) and a spatial inter-county connectivity coefficient ′ ( ) . These two coefficients are specified by the findings from some research institutes through mobile device data.
Much room exists for future improvements on these two coefficients.

Referee's comment: What are the estimates used for the basic reproduction number, and ? Are they posterior means? Why is there no uncertainty interval reported?
Our response: Yes, parameters of and are estimated by the posterior means. Then, we use their posterior means to calculate the basic reproduction number 0 . The uncertainty for 0 is given in Table 1, where the uncertainty for the estimates of and are also listed.
We would like to express our gratitude for your review and careful reading of our paper. Your insightful comments in the review report helped us improve greatly on our manuscript. Below are the listed point-to-point responses to each of your comments presented in italics. Our response: Thanks for these insightful remarks on various issues related to data quality. We agree with you that the issue of under-reporting is closely related to the testing strategies and rates.

Referee's comment:
More than two months later after the first reported COVID-19 case in the US, the testing selection bias has been greatly mitigated by better availability of tests via driving through testing stations and other walk-in testing clinics, although such issue with no doubt remains part of the sampling bias. Below we list a few points that we did in the revision to address your critique.
• We extend the eSIR model to an eSAIR model by including a new compartment of antibody. This extension is appealing to incorporate the serological testing survey results into the infection dynamics system to account for the proportion of people who were infected but did not get chance for the coronavirus RT-PCR diagnostic test, and now are self-immunized with the COVID-19 antibody. We believe that this new addition of antibody compartment can greatly address the under-reporting issue raised in your remarks.
• Indeed, we fit the eSAIR model state by state to estimate state-specific model parameters, under the assumption that the testing rate is homogeneous within a state. Although there exists heterogeneity of the test rate across counties in a state, such inter-county differences are deemed much smaller than the inter-state differences. We added some comments in the revision (see the fourth paragraph of Section 2.2) to address this issue.

Referee's comment: It would have been of interest to see how well the model actually performs, using predictions from, say March 15 or March 30 and then comparing predicted with observed counts.
Our response: Thanks for this good suggestion. In this revision, we report the sum-of-squared prediction errors (SSPE), which is an error being the difference between one-day ahead predicted and observed proportions of infections. See and Figure A3. See additional correspondences with the Dataviz Editor.

Referee's comment: The grammar errors and typos.
Our response: We have tried our best to identify and correct the grammar errors and typos, including those mentioned in the review reports by the other referees. The final version of the revised manuscript has been proofread by a native speaker.
We would like to express our gratitude for your review and careful reading of our paper. Your insightful comments in the review report helped us improve greatly on our manuscript. Below are the listed point-to-point responses to each of your comments presented in italics.

Referee's comment: One weakness of the paper is the ad-hoc method of parameter estimation.
Non-spatial SIR models are fit separately for each state with some adjustment made for counties above the average infection level in the state. No assessment is made of parameter uncertainty and the effect it will have on predictions.
Our response: We appreciate your critique, which helps us clarify our analysis strategies used in the paper.
• Our parameter estimation is done systematically and consistently using the classical statespace model and the standard MCMC algorithm. This Bayesian estimation method in the framework of state-space models has been extensively developed and applied widely in the literature since the early 1990s. We did not propose new parameter estimation methods.
• We have clarified the reasons behind the choice of running state-specific eSAIR model in the fourth paragraph of Section 2.2. They are (i) the testing strategies and rates are rather different over states; (ii) fitting a county-level spatiotemporal model is challenging due to insufficient county-level data in some counties. Given potential data quality issues, it turns out that the state-level analysis provides a more reliable estimation of the model parameters.
• Using the Bayesian framework, we can get the 95% credible interval to quantify the estimation uncertainty from 200,000 MCMC draws. In principle, when there are no time constraints, we can project 200,000 risk scores using the CA-eSAIR model, each from one MCMC draw from the eSAIR model. Consequently, we can assess prediction uncertainty.
Since this prediction is done at a county-level where each projection involves 3109 counties, multiplying a factor of 200,000 will lead to an extremely high computational cost. To simplify this calculation, we demonstrate the propagation of estimation uncertainty into risk prediction in the way that the 95% credible intervals of the , , , carry over those uncertainties for the projected risk. This is just a simple solution that manifests the uncertainty in the risk projection. Referee's comment: Is this spatiotemporal model more accurate than fitting individual SIR models to each state?

Referee's comment:
Our response: We did fit these two scenarios and found that the spatiotemporal CA-eSAIR model gives SSPE = 4.13e-08 at the state level, while the state-level eSAIR model has SSPE = 1.59e-06 at the state level. The former is 38.5 times more accurate than the latter in reducing the prediction error. See some additional correspondences with the Dataviz Editor.

Referee's comment:
The results produced in Figure 4 do not correspond closely to the raw rates mapped in https://coronavirus.jhu.edu/us-map. This paper's risk estimates change abruptly at state borders, most notably in Idaho and Nebraska, whereas the observed rates on https://coronavirus.jhu.edu/us-map show more changes within states than at state borders. The north-east of New York state has observed fairly low incidence, whereas the risk in Figure 4 is in the highest bin. In their response, the authors state "Unfortunately, we do not know if the discrepancies over neighboring counties on the state borders are artifact." I must conclude that the biggest signal the model is producing, large state-level changes, is indeed an artifact and the results produced in this paper must be viewed with suspicion.
Our response: We appreciate your point of view on the issue of non-smooth projected risk scores over some counties along the state borders. Thinking more about a potential solution overcoming abrupt changes across state borders, we realize that this may not be that simple and is certainly beyond the capacity of our current methodology. We regard this as a limitation in this paper. This limitation pertains to the initial values generated from a state-level eSAIR analysis in that we assume both control measures and testing policies/strategies are state-specific. Such within-state homogeneity is also used in the prediction. Consequently, the resulting intrastate projected risks seem to be more homogeneous than the inter-state projected risks, and some counties at state boarders appear to have noticeable discrepancies in their projected risks. It is of interest in a future work to discern the true inter-state differences from artifacts in the risk prediction over the border counties. To do this, we may fit the eSAIR model for the border counties in addition to the current analysis with within-state counties. Alternatively, we may perform a local smoothing (e.g. spatial moving average) for the risk scores of counties at state borders if the need of this procedure is deemed necessary, judged by a certain objective criterion. This deserves a further exploration. We have added this additional discussion in the conclusion section of the paper (see the second last paragraph) to address the inter-state differences in the projected risks.  Figure A5." That's a really good comment of the referee, but the answer is strange, since there is no Figure A5. I'm also puzzled by the SSPE value. It seems low, until you remember that it is a sum of squared small values and only for a 1-day ahead forecast. I would need more information to judge whether this is good or not (a map of the differences would be good). Since the paper discusses 7-day ahead forecasts they should also carry out a 7-day ahead test.
Our response: This is a typo, which has been corrected. It is actually Figure A3. We have included a new figure ( Figure A3b) in the revision to show the squared error values for the one-day ahead risk prediction for each county in state New York. This is an example to illustrate the tuning results.
We appreciate your suggestion of carrying out a 7-day ahead test. Although it is possible to carry out a 7-day-ahead test, we did not choose to do so. Instead, we did a one-day ahead test in Figure   A3b, for the consideration of the prediction uncertainty. In our view, validation based on a single prediction value may not be very rigorous. Given the pandemic in the US evolves in a fast pace, with a lot of uncertainty and heterogeneity in such a process, a prediction interval with the quantification of prediction uncertainty is deemed a better validation approach in this context. It, however, requires substantially more computing resources to calculate a posterior prediction interval, which will be studied in our future projects. (4) Our response: Thanks for these valuable suggestions. We have incorporated them (in fact two major points) in the revision.

Editor's comment: As for
• In regard to the first suggestion on the prediction error, we have included test results for the 1 to 7-day ahead prediction accuracy of the estimated infection prevalence at countylevel. The following table gives the averaged squared error of the predicted county-level infection prevalence for each day from May 3 to May 9. The prediction error is all at the rate of 10 −5 , namely one case difference for the cumulative number of infections per 10,000 people in a county. For an example with an average county of 100,000 people (in fact 97,118 in our data), with the data up to May 2, the predicted total number of infections on May 3 is about 20 cases more or less than the actual observed number of confirmed infections, and in most of the cases our predicted numbers are larger than the reported. The prediction error increases due to the increased uncertainty over time. The prediction errors within the first 3 days are very close. This prediction error should be interpreted with caution, due to the underreporting issue related to the number of confirmed cases in a county. Since our prediction model takes the rate of antibody in both estimation and prediction of prevalence, the predicted number of infections includes both symptomatic and asymptomatic cases in the prediction. In contrast, the observed data available in the public database only contains the number of confirmed symptomatic cases (with no asymptomatic cases). So, we expect that the error rate would be even smaller had the number of asymptomatic cases available in the test data. We added the above discussion on the top of page 13 (above Figure 6). • To ensure the convergence of the MCMC, we set the adaptation number to be 10 4 , thinned the chain by keeping one draw from every 10 random draws to reduce autocorrelation, set a burn-in period of 2 × 10 5 draws to let the chain stabilize, and start from 4 separate chains of length 5 × 10 5 . Thus, in total, we have had 2 × 10 5 effective draws in that about 2 × 10 6 draws were discarded. Moreover, we monitored trace plots of all the relevant model parameters to check the quality of the mixing for the MCMC draws and algorithm convergence. The draws after the burn-in were also exported to run additional checking using the CODA R package. Below, we provide an example of the trace plot and density estimate for the basic reproduction number 0 for your perusal. In our view, the MCMC draws look good. We added a short description of the practice above in the revision (the paragraph below Table 1). Our response: We did the following revisions.

Dataviz Editor's comment:
1) Per the suggestion by the dataviz editor, in this round of revision we have adopted the weighted absolute prediction error (WAPE) to replace the previous (unweighted) average squared prediction error. The weight is a ratio of county population size over the population size of all counties. We agree with you that this WAPE, adjusted by county population size, is more appropriate in its magnitude and easier to interpret. See the highlighted changes on page 19.
2) To see how the change of the prediction error metric may affect the selection of tuning parameter , below we included Table 5 (not included here but available in the original letter to the editor) that showed the performances of three different types of one-day ahead prediction errors, including the weighted absolute prediction error (WAPE), the (unweighted) average squared prediction error (ASPE), and the weighted squared prediction error (WSPE). Note that ASPE is the one used in the previous version, and the WAPE is the new one used in the current version, and WSPE is the case regarding what if the previous ASPE were weighted by county population size. Overall, the optimal tuning parameter selection can be carried out with each selection criterion with stable numerical performances. We did not include this table in the paper as it did not seem to be an essential piece of analysis in the empirical study and can avoid confusion. We would like to stay with WAPE for the ease of interpretation.
3) To illustrate the prediction accuracy, per your suggestion, we included Figure 8 that showed the nationwide 7-day ahead WAPEs (panel A) along with the county population size (Panel B).
4) In addition, per your suggestion, we added Figure 9 to illustrate the densities (similar to histograms) of WAPEs over the period of 7 days considered in the prediction, namely May 3-9. Our response: We have added this reference in our paper to facilitate the discussion on the prediction error. We agree with you about that "small" errors don't necessarily mean more accurate;

Editor's comment:
rather the WAPE is only a relative metric, which should be interpreted with caution. This is mainly because of biases in surveillance data collection and infection case ascertainment. See more discussions in the closing sentences of the second paragraph on page 19. Our response: Thanks for your good suggestion. In this revision, we adopted the average absolute prediction error weighted by county population, called weighted absolute prediction error (WAPE)

Dataviz editor's comment:
in the paper. The weighted average is based on all the counties within the 39 states that passed the MCMC convergence diagnosis; these states have experienced severer covid-19 pandemic so that their data are relatively abundant to fit the model well. The initial values of the other states are given by the national average estimates in the risk prediction. Per your suggestion, we included Figures 8 and 9 to illustrate the prediction accuracy. Figure 8 shows the nationwide 7-day ahead