Estimating Probabilities of Success of Clinical Trials for Vaccines and Other Anti-Infective Therapeutics

A key driver in biopharmaceutical investment decisions is the probability of success of a drug development program. We estimate the probabilities of success (PoS) of clinical trials for vaccines and other anti-infective therapeutics using 43,414 unique triplets of clinical trial, drug, and disease between January 1, 2000, and January 7, 2020, yielding 2,544 vaccine programs and 6,829 non-vaccine programs targeting infectious diseases. The overall estimated PoS for an industry-sponsored vaccine program is 39.6%, and 16.3% for an industry-sponsored anti-infective therapeutic. Among industry-sponsored vaccines programs, only 12 out of 27 disease categories have seen at least one approval, with the most successful being against monkeypox (100%), rotavirus (78.7%), and Japanese encephalitis (67.6%). The three infectious diseases with the highest PoS for industry-sponsored non-vaccine therapeutics are smallpox (100%), CMV (31.8%), and onychomycosis (29.8%). Non-industry-sponsored vaccine and non-vaccine development programs have lower overall PoSs: 6.8% and 8.2%, respectively. Viruses involved in recent outbreaks---MERS, SARS, Ebola, Zika---have had a combined total of only 45 non-vaccine development programs initiated over the past two decades, and no approved therapy to date (Note: our data was obtained just before the COVID-19 outbreak and do not contain information about the programs targeting this disease.) These estimates offer guidance both to biopharma investors as well as to policymakers seeking to identify areas most likely to be undeserved by private-sector engagement and in need of public-sector support.


Introduction
In this paper, we provide estimates of the historical probabilities of success (PoS) of clinical trials for vaccines and other therapeutic drugs for infectious diseases to inform discussions on the planning and financing of the fight against one of humanity's oldest foes. This is of particular importance in light of the recent havoc wreaked by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the virus that causes coronavirus disease . While the probabilities of success of therapeutic drugs for various disease groups like oncology are well-documented (Abrantes-Metz et al., 2004;DiMasi et al., 2010;Hay et al., 2014;MIT Laboratory for Financial Engineering, 2020;Smietana et al., 2016;Thomas et al., 2016;Wong et al., 2019bWong et al., , 2019a, relatively little has been published on treatments for infectious diseases and vaccine development despite their importance (Davis et al., 2011;Pronker et al., 2013). Prior studies have focused on narrower subsets relevant to their specific interests and have relied on much more limited data sets. For example, Young et al.
(2020) employed 10 to 25 data points per estimated value from the Bill and Melinda Gates Foundation to estimate the PoS of vaccines for neglected diseases, and DiMasi et al. (2020) reported PoS estimates on a per-drug basis using 2,575 trials for diseases of interest to the Gates Foundation. In contrast, we employ a much larger and broader dataset of 16,328 unique clinical trials to estimate the PoS of vaccines and non-vaccine therapeutics targeting 29 different infectious diseases using all available drug/indication pairs-a methodology that has been argued to be more relevant for evaluating drug development R&D risk and productivity (Hay et al., 2014;Thomas et al., 2016;Wong et al., 2019b).
Vaccination is commonly recognized as one of the most cost-effective public health measures for combatting infectious diseases (André, 2002;Ehreth, 2003;Kieny & Girard, 2005;OECD, 2013;Pronker et al., 2013;Rémy et al., 2015). In developed countries, routine prophylactic vaccination and effective treatment options have led to the control or complete elimination of several deadly infectious diseases through individual and herd immunity, preventing millions of deaths and untold suffering each year. This prophylaxis dramatically reduces the burden on the healthcare system and society as a whole. In addition, the deaths, hospitalizations, and treatment costs avoided by these measures have led to significant economic savings (Ehreth, 2003;Rémy et al., 2015; US Department of Health and Human Services, 2017).
As technology continues to advance, one expects that the human species will be better able to cope with these diseases. The fact remains, however, that we still do not have effective treatments or vaccines for many infectious diseases. While the discovery of antibiotics has reduced the mortality rate of bacterial infection, and the development of the smallpox vaccine has led to the eradication of the devastating disease (World Health Organization, 1980), other infectious diseases, such as Acquired Immunodeficiency Syndrome (AIDS) and A drug development program is the clinical investigation of the use of a drug for a disease, typically consisting of sequential clinical trials, separated into phases. We make the assumption that each program must transition from phase 1 to phase 2 to phase 3 to approval. We say that a drug development program has reached phase i if it is observed, or can be inferred, that there is at least one trial in phase i. The probability of a drug development program transitioning from phase i to phase j (PoSij) can be computed using the simple ratio Nj/Ni, where Ni is the number of drug development programs initiated at phase i (where i = 1, 2, or 3) with known outcomes between phase i and phase j (where j = 2, 3, or "A" which denotes regulatory approval), and Nj is the number of drug development programs among the former that made it to phase j.
We call the estimated probability of a drug development program transitioning from phase i to phase i+1 the "phase i PoS", and the "estimated overall PoS" is defined as the estimated probability of a drug development program going from phase 1 to regulatory approval in at least one country. The estimated probability of a drug development program transitioning from phase 1 to approval-estimated directly using the method described above-is called the "path-by-path" estimate of the overall PoS, and is reported for all PoS calculations.
It should be emphasized that because of this treatment of in-progress drug development programs, path-by-path PoS estimates are not multiplicative, i.e., PoS PoS PoS A PoS A , in contrast to phase-by-phase estimates, which do multiply (see Wong, Siah & Lo (2019b) and https://projectalpha.mit.edu/faq for details and illustrative examples).
To simplify this terminology, we will henceforth omit the qualifier "estimated" when referring to the PoS, so it should be understood that all PoS values reported in this article are statistical estimates of unobservable population parameters.
8 Apr 2020 © 2020 by Wong, Siah, and Lo Page 3 of 25 All Rights Reserved When identifying phase transitions, we make the standard assumption that phase 1/2 and phase 2/3 trials are to be considered as phase 2 and phase 3, respectively. We report only drug development programs that have seen at least one trial with a definite outcome.

Data
We extracted clinical trials metadata from the January 7, 2020, snapshots of Citeline's PharmaProjects and TrialTrove databases, provided by Informa Pharma Intelligence. Clinical trial metadata was retrieved from the TrialTrove database while the approval data was obtained from the PharmaProjects database, both of which are required to identify the drug development programs. These databases contain information from both US and non-US sources. We consider a drug approved if it is approved in any country. All clinical trials used in this analysis have end dates after January 1, 2000, and start dates before January 7, 2020.
We filter our data to include only trials that have been tagged by Citeline as being in the 'Infectious Disease' or 'Vaccines (Infectious Diseases)' therapeutic areas. The vaccine types and diseases are provided by the databases. The database encodes each unique triplet of trial identification number, drug, and disease as a data point. As such, a single trial can be repeated as multiple data points. Since the two therapeutic areas may overlap in data points, we define clinical trials that are involved in any vaccine development as part of a 'vaccine' development program. In addition, we process the data such that more specific diseases (e.g., rabies) can be identified instead of broad vaccine classes (e.g., vector-borne disease vaccines). Clinical trials that are not involved in any vaccine development program will be deemed to be part of a 'non-vaccine' drug development program. We derive 43,414 data points in total. We define an 'industry-sponsored' development program as one where there is at least one commercial company involved in any stage of clinical development. The complement-in which there is no commercial company involved in any stage of the vaccine or drug development program-shall be referred to as 'non-industry-sponsored'.
We plot the number of development programs known to start in each month from January 2000 through December 2019 in Figure 1. There are 1,838 and 706 industrysponsored and non-industry-sponsored vaccine development programs, respectively, and, 3,851 and 2,978 industry-sponsored and non-industry-sponsored non-vaccine drug development programs targeting infectious diseases, respectively. As can be seen from Figure 1a, the number of industry-sponsored clinical programs attempting to treat infectious diseases is often greater than the number of vaccine development programs. The gap between the number of programs initiated for each development type seems to have increased between January 2000 and May 2014 before declining (see Section A7 for a more detailed analysis). We see a precipitous fall in the number of infectious disease treatment development programs initiated between late 2018 and mid-2019, which is likely to be related to declining investment in the research and development (R&D) of novel antibiotics, 8 Apr 2020 © 2020 by Wong, Siah, and Lo Page 4 of 25 All Rights Reserved precipitated by the closure of antibiotics biotechnology firms and the withdrawal of pharmaceutical companies from the antibiotics business (Hu, 2018;Langreth, 2019).
Between January 2000 and June 2011, the number of non-industry-sponsored vaccine development programs initiated is on par with the number of non-industrysponsored, non-vaccine anti-infective drug development programs initiated (see Figure 1b). However, the number of non-vaccine drug development programs initiated begin to outpace the number of vaccine development programs after January 2012, and experience a rapid boom between mid-2015 and mid-2018 before declining rapidly between October 2018 and January 2019.  Vaccine Treatment for infectious diseases . CC-BY 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
is the (which was not peer-reviewed) The copyright holder for this preprint

Vaccines
Overall, 2,544 vaccine development programs are observed in our dataset, of which 1,838 are sponsored by industry and 706 do not involve any industry sponsor in any stage of development. For industry-sponsored drug development programs, 'respiratory infections' is the most actively researched vaccine category, comprising 34.8% (n=640) of all vaccine development programs (see Figure 2). Hepatitis B virus (HBV) and human immunodeficiency virus (HIV) vaccines represent 11.6% (n=213) and 9.8% (n=181) of all vaccine development programs, respectively, whereas intra-abdominal infections, monkey pox, and severe acute respiratory syndrome (SARS) vaccines are the least researched fields, with only one development path observed per disease.
A similar pattern can be seen for the non-industry-sponsored vaccine development programs; excluding the 'others' category, the top three most researched vaccine categories are also respiratory infections (24.8%), HIV (20.4%), and HBV (8.2%), whereas Middle East Respiratory Syndrome (MERS) and SARS are the least researched diseases with one development program each. CC-BY 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2020.04.09.20059600 doi: medRxiv preprint 8 Apr 2020 © 2020 by Wong, Siah, and Lo Page 6 of 25 All Rights Reserved From Figure 3a, we can see that the overall PoS for industry-sponsored vaccine development programs is 39.6% (standard error, or SE: 1.2%), which is substantially higher than the average overall PoS of 11.0% (SE: 0.2%) across all industry-sponsored drug development programs (see Table 2 2020), despite the fact that the latter computed their estimates using a different method (a "phase-by-phase" approach) and considered only lead indications. We estimate PoS12, PoS23, and PoS3A to be 82.5% (SE: 0.9%), 65.4% (SE: 1.3%), and 80.1% (SE: 1.4%), respectively.
Across all industry-sponsored vaccine development programs, we can see that monkeypox vaccines have had the most developmental success, followed by rotavirus and Japanese encephalitis vaccines (see Figure 3). Their overall success rates are 100% (SE: 0%), 78.7% (SE: 5.2%), and 67.6% (SE: 8.0%), respectively. The PoS for monkeypox is based on only one sample. Only 12 diseases out of the 27 disease categories with at least one development path observed have seen at least one approved vaccine.
is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2020.04.09.20059600 doi: medRxiv preprint 8 Apr 2020 © 2020 by Wong, Siah, and Lo Page 7 of 25 All Rights Reserved (25.0%, SE: 21.7%). The latter estimates are derived from only a handful of samples and must be interpreted with caution as their large standard errors suggest. . CC-BY 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(a) PoS of industry-sponsored vaccine development programs (b) PoS of non-industry-sponsored vaccine development programs
is the (which was not peer-reviewed) The copyright holder for this preprint

Non-Vaccine Anti-Infective Therapeutics
In contrast to vaccines, which are intended to prevent disease, a number of alternatives have been developed to treat-and, in some cases, cure-patients afflicted with an infectious disease. According to our dataset, 3,851 and 2,978 industry-sponsored and non-industry-sponsored non-vaccine drug development programs, respectively, have been initiated in the area of infectious disease (see Figure 4). The top three diseases with the greatest number of industry-sponsored drug development programs are respiratory infections (21.8%), HIV (15.7%) and hepatitis C virus, or HCV (14.1%). Together, they comprise 51.6% of all industry-sponsored non-vaccine development programs. Nonindustry anti-infectious-disease drug development programs focus on treating respiratory infections (20.5%), HIV (13.9%), and bacteria skin infection (12.1%).
With respect to addressing the most recent virus outbreaks-MERS, SARS, Ebola, and Zika-a total of 9 industry-sponsored and 36 non-industry-sponsored non-vaccine drug development programs were initiated over the past twenty years, and there have been no approved therapies to date. CC-BY 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
is the (which was not peer-reviewed) The copyright holder for this preprint   From Figure 5a, we can see that the overall PoS across all industry-sponsored drug development programs treating infectious diseases is 16.3% (SE: 0.7%). The PoS12, PoS23, and PoS3A are 65.0% (SE: 0.8%), 64.3% (SE: 1.0%), and 51.1% (SE: 1.6%), respectively. Based on our data, the highest success rates for industry-sponsored non-vaccine development programs have been for smallpox (100.0%, SE: 0.0%), CMV infection (31.8%, SE: 7.0%), and onychomycosis (29.8%, SE: 6.7%). There are currently no non-vaccine therapies approved for rotavirus, SARS, rabies, Ebola, West Nile Virus (WNV), Marburg, yellow fever, chikungunya, MERS, monkeypox, or norovirus. With the exception of norovirus and MERS, these diseases without any vaccine are predominantly prevalent in nonindustrialized nations, and thus represent neglected diseases. It is also interesting that for the latter eight diseases, even the PoS12 is low. Since phase 1 trials in the development of anti-infective therapies focus primarily on safety, understanding the pharmacokinetics of the compound, and maximum tolerable dose levels, it can be inferred that the drugs tested are either of high toxicity or lack the necessary characteristics required for optimal absorption, distribution, metabolism, and excretion (ODME), or perhaps failed to advance due to financial constraints.
is the (which was not peer-reviewed) The copyright holder for this preprint

Industry-Sponsored Trials
In an attempt to shed more light on the industry-sponsored vaccine and non-vaccine drug development programs, we classify the diseases by their biological family and transmission type. The classifications are presented in Table 1 in the Supplementary Materials. We then compute the PoS using these classifications.
Looking at the vaccine PoS by transmission route (see Figure 6a), we see that vaccines for diseases transmitted through animal bites have the highest overall PoS (61.3%, SE: 4.7%), whereas no vaccine has been developed for diseases transmitted through contaminated food or water.
We find that companies have been most successful in developing non-vaccine treatments for diseases transmitted between humans through the air, with 50.0% (SE: 25.0%) of all drug development programs making it from phase 1 to regulatory approval (see Figure 6b). Unfortunately, this is based on only four drug development programs and may not be indicative of the general trend. Treatments for diseases that transmit through 'human to human (others)' have an overall PoS of 21.5% (SE: 1.2%) while no approval is observed for diseases transmitted through 'animal bites' or 'contaminated food or water'. . CC-BY 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
is the (which was not peer-reviewed) The copyright holder for this preprint

Discussion
Companies producing vaccines and other therapeutics for infectious diseases have gradually been retreating from these spaces in recent years. The number of companies producing vaccines has dwindled over the past few decades, and the top four vaccine companies now make up more than 90% of the global market (Evaluate, 2018). Similarly, . CC-BY 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
is the (which was not peer-reviewed) The copyright holder for this preprint It should be no surprise that investors are unwilling to invest in companies producing vaccines and treatments for infectious diseases given the economics of this market (Vu et al., 2020). These have been generally regarded as low-margin products, and they have low growth potential compared to treatments in other therapeutic areas, such as oncology or cardiovascular diseases. As a comparison, Revlimid, the blockbuster cancer drug for multiple myeloma, earned $9.69 billion for Celgene in 2019 (Celgene, 2019), whereas the vaccine and antiviral portfolios of GlaxoSmithKline, the top vaccine producer by sales, generated only $6.65 billion and $5.89 billion in revenues, respectively, in 2017 (Evaluate, 2018).
This lack of investment has resulted in a relatively low number of development programs for vaccines and treatments of infectious diseases. Only 5,689 industry-sponsored programs were initiated in the past two decades, a mere 12.5% of all industry-sponsored drug development programs launched in the same period.
Our study indicates that the technical success rate is unlikely to be a barrier to investments in new vaccines and treatments for infectious diseases, unlike cancer drugs, where the financial risk of new R&D projects comes from the reduced chance of bringing a drug-indication pair from phase 1 to market. The overall PoS of industry-sponsored vaccines and treatments for infectious diseases are above the average for all therapeutic groups (see Table 2 in the Supplementary Materials).
It is often suggested that the fundamental issue behind this lack of investment is that the market for vaccines and treatments for infectious diseases is simply not lucrative enough. Despite the expense of research and development and the need for large-scale production (Weir & Gruber, 2016), anti-infective disease treatments are used only occasionally, while vaccine companies face an avalanche of liability lawsuits (Hensley & Wysocki Jr., 2005). Furthermore, the companies are at the mercy of government pricing decisions (Hu, 2018).
It remains to be seen if more non-industry-sponsored research can alleviate the issue. Our study shows that only 6.8% (SE: 1.0%) and 8.2% (SE: 0.6%) of non-industry-sponsored vaccines and non-vaccine infectious disease development programs transition from phase 1 to approval, respectively. However, this may be a result of selection bias: promising vaccine and therapeutics initiated in non-industry settings are often pursued in conjunction with industry-sponsored sponsors, whereas commercially less promising projects are more likely to be pursued by non-profit organizations.
. CC-BY 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
is the (which was not peer-reviewed) The copyright holder for this preprint

Conclusion
The world today has never been in greater need of effective vaccines and other antiinfectives. As the COVID-19 crisis has shown, infectious diseases still have the potential to cause a catastrophically large number of deaths and disrupt the daily lives of billions. We hope that our research into the probability of successfully developing infectious disease therapeutics will inform all the stakeholders and catalyze innovation and greater investment in this critical and underserved field.
. CC-BY 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2020.04.09. CC-BY 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

A2. PoS Tables across all therapeutic areas
We reproduce the probability of success across all therapeutic groups.

A7. Regression of the difference between the number of industry-sponsored investigational treatments initiated and industry-sponsored investigational vaccines initiated against time
We attempt to find correlations between the difference in the number of industry-sponsored clinical development programs initiated for treatments and the analogous number for vaccines (DIFF). We define 'TIME' as the number of months elapsed since January 1, 2000.
We test five regression models with the follow specifications: (A) DIFF = Constant + α1·TIME (B) DIFF = Constant + α1·TIME + α2·TIME 2 (C) DIFF = Constant + α1·TIME + α2·TIME 2 + α3·TIME 3 (D) DIFF = Constant + α1·TIME + α2·TIME 2 + α3·TIME 3 + α4·TIME 4 (E) DIFF = g(TIME), where g is a locally linear Gaussian kernel Models (A) to (D) are performed on Microsoft Excel 2019 while Model (E) is implemented using the statsmodel package in Python. The bandwidth for the non-parametric kernel regression is selected using the least-squares cross-validation method. The results of the regressions are reported in Table 11. Our best fitting model, Model E, indicates that the difference between the number of non-vaccine treatment development programs initiated All Rights Reserved and the number of vaccine programs initiated widened between January 2000 and May 2014 before narrowing. The difference between the number of initiated programs is close to zero in December 2019, possibly due to the withdrawal of pharmaceutical companies from the antibiotics business and bankruptcy of antibiotics companies.  . CC-BY 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10. 1101/2020