Skip to main content
SearchLoginLogin or Signup

Evaluating Personalized (N-of-1) Trials in Rare Diseases: How Much Experimentation Is Enough?

Published onSep 08, 2022
Evaluating Personalized (N-of-1) Trials in Rare Diseases: How Much Experimentation Is Enough?
·

Abstract

For rare diseases, conducting large, randomized trials of new treatments can be infeasible due to limited sample size, and it may answer the wrong scientific questions due to heterogeneity of treatment effects. Personalized (N-of-1) trials are multi-period crossover studies that aim to estimate individual treatment effects, thereby identifying the optimal treatments for individuals. This article examines the statistical design issues of evaluating a personalized (N-of-1) treatment program in people with amyotrophic lateral sclerosis (ALS). We propose an evaluation framework based on an analytical model for longitudinal data observed in a personalized trial. Under this framework, we address two design parameters: length of experimentation in each trial and number of trials needed. For the former, we consider patient-centric design criteria that aim to maximize the benefits of enrolled patients. Using theoretical investigation and numerical studies, we demonstrate that, from a patient’s perspective, the duration of an experimentation period should be no longer than one-third of the entire follow-up period of the trial. For the latter, we provide analytical formulae to calculate the power for testing quality improvement due to personalized trials in a randomized evaluation program and hence determine the required number of trials needed for the program. We apply our theoretical results to design an evaluation program for ALS treatments informed by pilot data and show that the length of experimentation has a small impact on power relative to other factors such as the degree of heterogeneity of treatment effects.

Keywords: ALS, heterogeneity of treatment effects (HTE), minimally clinically important heterogeneity, patient-centered research, rare diseases, sample size formulae


1. Introduction

When managing chronic diseases and conditions, patients commonly try different treatments over time before finding the right treatments. The practice of N-of-1 trials operationalizes this type of patient-centered experimentation by randomizing treatments to single patients in multiple crossover periods, often in a balanced fashion. N-of-1 trials can be used to identify the optimal personalized treatment for single patients in situations involving evidence for heterogeneity of treatment effects (HTE) or the lack of a cure (Davidson et al., 2021). As such, these trials are sometimes called single-patient trials or personalized trials. First introduced by Hogben and Sim (1953), N-of-1 trials have recently been applied to treat rare diseases (Roustit et al., 2018), as well as common chronic conditions such as hypertension (Kronish et al., 2019; Samuel et al., 2019). The use of personalized (N-of-1) trials in treating rare diseases is particularly appealing because demonstrating comparative effectiveness of treatments at the population level via parallel-group randomized trials is often infeasible.

In this article, we consider personalized (N-of-1) trials of treatments for people with amyotrophic lateral sclerosis (ALS). ALS is a rare neurodegenerative disease that affects motor neurons in the brain and spinal cord. Despite the fact that two modestly effective disease-modifying medications have been approved for the treatment of ALS (Edaravone [MCI-186] ALS 19 Study Group, 2017), the disease has no cure, and thus, symptomatic treatments remain an important strategy to improve the quality of life in people with ALS (Mitsumoto et al., 2014). In particular, muscle cramps are disabling symptoms affecting over 90% of ALS patients, with demonstrated between-patient variability and yet stable manifestation of symptoms in a patient (Caress et al., 2016). Several treatments targeting muscle cramps have been evaluated and have shown mixed results, suggesting the presence of HTE or inadequate statistical power for definitive conclusions (Baldinger et al., 2012). Furthermore, ALS itself has been considered markedly heterogeneous in its pathogeneses, disease manifestations, and disease progression (Al-Chalabi & Hardiman, 2013; van den Berg et al., 2019). These are the clinical situations in which personalized (N-of-1) trials can help patients identify the best treatments for themselves (Kravitz et al., 2014).

Despite renewed interest in N-of-1 trials and numerous recent applications, the literature has offered little discussion on the evaluation of the usefulness of N-of-1 trials. As N-of-1 trials typically require active physician involvement, intense monitoring, and frequent data collection compared with usual care, these additional costs and resources warrant careful evaluation of effectiveness before said trials are used in practice as regular clinical service. The primary evaluation question is “Does the practice of N-of-1 trials in clinical care improve outcomes on the standard of care?” However, presuming the quality of treatment decisions based on N-of-1 trials is higher than what standard of care would prescribe, reports of N-of-1 trials often describe only the applications and results of the trials without plans to address the evaluation question. An exception is Kravitz et al. (2018) who compare N-of-1 intervention against the usual care for patients with musculoskeletal pain in a randomized fashion using data collected after experimentation ends and find no evidence of superior outcomes among participants undergoing N-of-1 trials. However, when planning the study, the authors had not considered the underlying model that accounts for variability and correlation in the longitudinal observations and the assumptions on the effect size, which would in turn drive the appropriate sample size of an evaluation program for N-of-1 trials. A design issue related to sample size determination is the duration of experimentation in N-of-1 trials. In this article, we propose a framework to evaluate the quality and effectiveness of N-of-1 trials and develop specific guidance to address these design issues. We will introduce the evaluation framework in Section 2 and define the basic analytical model for analyzing N-of-1 trials in Section 3. The main findings on the experimentation duration and sample size are derived and described in Section 4 and applied to the ALS treatment program in Section 5. The article ends with a discussion in Section 6. All technical details are provided in the Appendices.

2. An Evaluation Framework for Personalized (N-of-1) Trials

2.1. The Anatomy of a Personalized (N-of-1) Trial

We consider an evaluation program comparing the effectiveness of personalized (N-of-1) trials in treating muscle cramps in people with ALS relative to the institutional standard of care. Under the program, people with ALS will be randomized to receive personalized (N-of-1) trials that compare two standard drugs prescribed for muscle cramps, mexiletine and baclofen. In each trial, a patient will be given the two drugs sequentially over T=18T=18 two-week treatment periods in two phases. The first phase consists of mm treatment periods (with m<Tm < T) when the two drugs are randomized in a multiple crossover fashion. This phase shall be referred to as the experimentation phase. In the remaining TmT-m treatment periods, the patient will continue with a drug treatment selected based on data in the experimentation phase. This phase shall be referred to as the validation phase (Figure 1).

Figure 1. Schema of an evaluation program for personalized (N-of-1) trials comparing treatment A and treatment B. Under the evaluation program, patients are randomized to either an N-of-1 trial or the standard of care.

During the treatment periods, the Columbia Muscle Cramp Scale (MCS) will be collected weekly to result in two MCS measurements for each period: one at the end of week 1 and one at the end of week 2. The MCS is a validated, composite score summarizing the frequency, severity, and clinical relevance of cramps in people with ALS (Mitsumoto et al., 2019). While the study does not include washout periods between treatments, only the measurement at the end of each two-week period will be used in the primary analysis in order to avoid carryover effects of the drugs.

Sandwiched between the two treatment phases is a feedback period where the MCS data in the experimentation phase are reviewed with the treating physician and the patient. The feedback period enables data-driven treatment decisions by providing the stakeholders with data visualization as well as numerical comparison (Davidson et al., 2021).

2.2. Standard of Care

In this article, we focus on a randomized controlled evaluation program where patients are randomized between an N-of-1 trial and standard of care (SOC). As depicted in Figure 1, a patient under SOC will be given either mexiletine or baclofen for 36 weeks, corresponding to the 18 two-week treatment periods in the N-of-1 trials, and will have the same follow-up schedule as the N-of-1 trial patients. Treatments in the ‘experimentation phase’ will be determined by the treating physicians. The ‘feedback period’ in the SOC arm may be viewed as a sham intervention and be conducted as a regular clinic visit before the patient continues into the ‘validation phase’ with the same drug in the remaining TmT-m treatment periods. By the virtue of randomization, MCS collected in the validation phase under SOC will serve as the control data and allow for an unbiased comparison with the validation phase in the N-of-1 trial patients.

Let p0p_0 denote the probability that mexiletine will be prescribed under SOC and p1p_1 the probability baclofen will be prescribed such that p0+p1=1p_0 + p_1 = 1. The special case p0=1p_0=1 and p1=0p_1=0 corresponds to a clinical scenario where mexiletine is considered the standard treatment. Generally, the program probability parameters p0,p1p_0, p_1 are somewhere between 0 and 1 when no clear best treatment exists. A program equipoise may be defined as when the treating physicians will give either of the drugs with equal likelihood, that is, p0=p1=0.5p_0 = p_1 = 0.5. These program parameters apparently affect the quality of treatment under standard of care, and hence the advantage of N-of-1 trials over standard of care. At the end of the evaluation program, these parameters can be estimated using the control data.

2.3. Design Parameters

While the study duration (or the number of treatment periods TT) is determined based on feasibility and how long a patient can be followed in the evaluation program, an N-of-1 trial under the evaluation framework is defined by the length mm of the experimentation phase, and hence the length TmT-m of the validation phase. Intuitively, the quality of the treatment decision by an N-of-1 trial improves with a larger mm as more data will be available during the feedback period. On the other hand, a long experimentation phase may place excessive burden on patients without benefitting them and imply a short validation phase for a given TT. Rather than maximize accuracy, the experimentation length mm will respond to the question “How much experimentation is needed for an N-of-1 trial to be beneficial to an individual?”

A second design parameter is the specification of an analytical plan used to guide treatment selection during the feedback period. Principled statistical or data science methods should be employed to ensure the analysis is rigorous, while a prespecified plan entails preprogrammed algorithms that in turn facilitate quick feedback to the stakeholders.

Finally, as in conventional randomized controlled trials, the number of patients randomized in an evaluation program will need to be determined to ensure adequate statistical power for the primary evaluation question on whether N-of-1 trials improve outcomes.

To summarize, the design parameters that need to be prespecified at the planning stage of an evaluation program are the primary analysis plan used in the feedback period, the experimentation length (mm) for each individual, and the number of individuals required. These will be discussed in next two sections.

3. An Analytical Model for N-of-1 Trials

Let yity_{it} be the outcome of patient ii in treatment period tt and xit{1,1}x_{it} \in \{ -1, 1\} be the corresponding treatment for i=1,,ni=1, \ldots, n and t=1,,Tt= 1, \ldots, T. Without loss of generality, we assume a large value of the outcome yity_{it} is desirable. To put the notation in the context of our study, we let yity_{it} denote the negative value of MCS at the end of each two-week treatment period. For the treatments, baclofen is coded as xit=1x_{it}=1 and mexiletine as xit=1x_{it} = -1. In this article, we focus on balanced sequences between baclofen and mexiletine in the experimentation phase, that is, assuming

t=1mxit=0.        (3.1)\sum_{t =1}^m x_{it} = 0.\ \ \ \ \ \ \ \ \text{(3.1)}

Consider the outcome model

yit=αi+βixit+ϵit        (3.2)y_{it} = \alpha_i + \beta_i x_{it} + \epsilon_{it} \ \ \ \ \ \ \ \ \text{(3.2)}

where βi\beta_i is the patient-specific treatment effect and the noise ϵit\epsilon_{it}s are mean zero normal with cov(ϵit,ϵis)=ρstσ2(\epsilon_{it}, \epsilon_{is}) = \rho_{st} \sigma^2 and ρtt=1\rho_{tt} = 1. To reflect heterogeneous symptoms and HTE among the patients, we postulate αiN(μA,σA2)\alpha_i \sim N(\mu_A,\sigma_A^2) and βiN(μB,σB2)\beta_i \sim N(\mu_B, \sigma_B^2). The mean μB\mu_B indicates the average treatment effect and the variance σB2\sigma_B^2 indicates the extent of HTE in the disease population. While μB=0\mu_B = 0 represents the null scenarios where there is no average treatment effects, a large value of σB2\sigma_B^2 indicates the needs for personalizing treatments.

Under model (3.2), the optimal treatment for patient ii can be expressed as 2I(βi>0)12 I(\beta_i >0) - 1, where I()I(\cdot) is an indicator function. During the feedback period, we may present to patient ii an estimated treatment effect β^i\hat \beta_i based on the experimentation phase data {(xit,yit):t=1,,m}\{(x_{it}, y_{it}): t = 1, \ldots, m \} along with the estimated optimal personalized treatment for the patient:

xi=2I(β^i>0)1.        (3.3)x_i^* %= \mbox{sgn}(\hat \beta_i) = 2 I( \hat \beta_i > 0) - 1. \ \ \ \ \ \ \ \ \text{(3.3)}

Subsequently, in the event of perfect adherence to analysis result, the patient will receive the estimated optimal treatment (3.3) in the validation phase, that is, xitxix_{it} \equiv x_i^* for t=m+1,,Tt= m+1, \ldots, T.

Some practical notes on the choice of β^i\hat \beta_i are in order. For the purposes of providing quick feedback, a broad range of estimators can be considered. The theoretical results derived in the following sections will hold for any estimators that are approximately normally distributed with mean βi\beta_i and some finite variance τi2\tau_i^2. A simple example is the the patient-specific least squares estimator β^iLS=t=1mxityit/m\hat \beta_i^{LS} = \sum_{t=1}^m x_{it} y_{it} / m for patient ii. The least squares estimator is unbiased for the patient-specific treatment effect βi\beta_i regardless of the variance-covariance structure of {ϵ}\{ \epsilon \} with variance

τi2=var(β^iLSαi,βi)=λiσ2/m where λi=1+stxisxitρst/m.        (3.4)\tau_i^2 = \text{var}(\hat \beta_i^{LS} | \alpha_i, \beta_i) = \lambda_i \sigma^2 / m \text{ where } \lambda_i = 1 + \sum_{s \neq t} x_{is} x_{it} \rho_{st}/m. \ \ \ \ \ \ \ \ \text{(3.4)}

Note that the conditional variance (3.4) is free of the patient-specific parameters αi\alpha_i and βi\beta_i. For the purposes of planning an N-of-1 trial, we will focus on the use of least squares. However, in the actual analysis, if additional information is available to inform the appropriate correlation structure of the data, likelihood-based estimation or weighted least squares accounting for such structure may improve efficiency.

4. How Much Experimentation Is Enough?

4.1. Patient-Centric Criteria and Length of Experimentation Phase

In this subsection, we discuss the choice of the experimentation length mm of an N-of-1 trial with respect to two different patient-centric criteria, both of which aim to maximize the benefits to patients on N-of-1 trials.

The first criterion is defined as the expected number of periods where a patient receives the optimal treatment. Mathematically, this criterion is denoted as E(zi)E(z_i), where ziz_i is the number of periods in which patient ii receives the optimal treatment over the TT treatment periods.

Proposition 1.  Suppose β^iN(βi,τi2)\hat \beta_i \sim N( \beta_i, \tau_i^2) under a balanced experimentation phase (3.1). Then for 0<mT0 < m \leq T,

E(zi)=m2+(Tm)Pr(WμB+σBUτi)E(z_i) = \frac{m}{2} + (T-m) \, \text{Pr} \left( W \leq \frac{ \left| \mu_B + \sigma_B U \right|}{\tau_i} \right)

where W,UW,U are independent standard normal variables. Furthermore, if μB=0\mu_B = 0, then

E(zi)=m2+(Tm)G(σB/τi)        (4.1)E(z_i) = \frac{m}{2} + (T-m) \, G \left( \sigma_B / \tau_i \right) \ \ \ \ \ \ \ \ \text{(4.1)}

where GG is the cumulative distribution function of W/UW/|U|, which is a pivotal distribution.

The second patient-centric criterion is defined as the expected average outcome of a patient during an N-of-1 trial. This criterion is denoted as E(yˉi)E(\bar y_i), where yˉi=t=1Tyit/T\bar y_i = \sum_{t=1}^T y_{it} / T is the average outcome of the patient in all TT treatment periods.

Proposition 2.  Under the same condition as in Proposition 1, for 0<mT0 < m \leq T,

E(yˉi)=μA+(1mT)[μB{2Φ(μBσB2+τi2)1}+2σB2σB2+τi2ϕ(μBσB2+τi2)]E(\bar y_i) = \mu_A + \left(1 - \frac{m}{T} \right) \left[ \mu_B \left\{ 2 \Phi \left( \frac{\mu_B}{\sqrt{\sigma_B^2 + \tau_i^2}} \right) - 1 \right\} + \frac{2 \sigma_B^2}{\sqrt{ \sigma_B^2 + \tau_i^2}} \phi \left( \frac{\mu_B}{\sqrt{\sigma_B^2 + \tau_i^2}} \right) \right] %\begin{equation} %E(\bar y_i) = % \mu_A + \left(1 - \frac{m}{T} \right) %\left[ %\mu_B \left\{ 2 \Phi \left( \frac{\mu_B}{\sqrt{\sigma_B^2 + \sigma^2/m}} \right) - 1 \right\} + %\frac{2 \sigma_B^2}{\sqrt{ \sigma_B^2 + \sigma^2/m}} \phi \left( \frac{\mu_B}{\sqrt{\sigma_B^2 + \sigma^2/m}} \right) %\right] %\mu_A + \left(1 - \frac{m}{T} \right) % \left[ % \mu_B \left\{ 2 \Phi \left( \frac{\tau \mu_B}{\sigma_B} \right) - 1 \right\} % + {2 \tau \sigma_B} % \phi \left( \frac{\tau \mu_B}{\sigma_B} \right) \right] %\label{eq:thm2}

where Φ\Phi and ϕ\phi respectively denote the standard normal distribution function and density.

We can derive a few practical principles from Proposition 1 and Proposition 2. First, conducting an N-of-1 trial with an experimentation length m<Tm < T is generally beneficial for the patient compared to experimentation in all TT period. Specifically, we can derive from Proposition 1 that the patient will receive at least half of the time, that is, E(zi)T/2E(z_i) \geq T/2 for all mm, and attain the minimum when m=Tm=T. Analogously from Proposition 2, the expected average outcome will be no smaller than the population average, that is, E(yˉi)μAE(\bar y_i) \geq \mu_A for all mm, and equality holds when m=Tm=T.

Second, we can derive from the propositions that E(zi)E(z_i) and E(yˉi)E(\bar y_i) are increasing in σB\sigma_B under the null μB=0\mu_B=0. In other words, an N-of-1 trial becomes more beneficial to the patient when there is a larger variability in the treatment effects across patients.

Third, and importantly, considering the null case where μB=0\mu_B = 0 and when the least squares β^iLS\hat \beta_i^{LS} is used to make inference based on the experimentation phase data is instructive. Under these conditions, we can derive from Proposition 2 that the criterion E(yˉi)E(\bar y_i) is maximized at

m=2T9+8ξiT+3        (4.2)m^* = \frac{2 T}{ \sqrt{ 9 + 8 \xi_i T} + 3} \ \ \ \ \ \ \ \ \text{(4.2)}

where ξi=σB2λiσ2\xi_i = \frac{\sigma_B^2 }{\lambda_i \sigma^2} and λi\lambda_i is defined in (3.4). While Equation (4.2) gives the optimal length mm^* as a function σB,σ,\sigma_B, \sigma, and λi\lambda_i, it provides some general guidance:

Main Result 1. The optimal experimentation length mm^* is less than one-third of the total N-of-1 trial duration from a patient’s perspective, that is, mT/3m^* \lesssim T/3.

4.2. Sample Size

In this subsection, we discuss how much experimentation is adequate in terms of the sample size enrolled to the evaluation program. We first define the quality of an N-of-1 trial as the expected health outcome under the estimated optimal treatment xix_i^* (3.3). Assuming perfect adherence to the analysis results in the feedback period, the quality of an N-of-1 trial can be defined as E(yi)E(y_i^*) where yi=t=m+1Tyit/(Tm)y_i^* = {\sum_{t=m+1}^T y_{it}} /(T-m) and the expectation is taken with respect to the distributions of αi,βi\alpha_i, \beta_i, xix_i^*, and {ϵit}\{ \epsilon_{it} \}. Analogously, we can define the quality of standard of care as E(yi)E(y_i') where yiy_i' is the average outcome observed in the validation phase for a patient under SOC and the expectation is taken under the assumption that the treatment in the experimentation phase continues to the validation phase. The primary objective of the evaluation program is to compare the quality of an N-of-1 trial and the quality of SOC. This can be formulated into a hypothesis testing problem with

H0:Δ:=E(yi)E(yi)0 versus H1:Δ>0        (4.3)H_0: \Delta := E(y_i^*) - E(y_i') \leq 0 \text{ versus} \ H_1: \Delta > 0 \ \ \ \ \ \ \ \ \text{(4.3)}

where Δ\Delta measures the degree of quality improvement due to N-of-1 trials defined over the patient population. The hypotheses (4.3) can in turn be tested using the regular ZZ-statistic:

Z=n(yˉyˉ)v+v        (4.4)Z = \frac{\sqrt{n} ( \bar{y}^* - \bar{y}' )}{ \sqrt{v^* + v'}} \ \ \ \ \ \ \ \ \text{(4.4)}

where yˉ\bar y^* and vv^* are respectively the sample mean and the sample variance of {yi}\{ y_i^* \} in the nn patients randomized to an N-of-1 trial and yˉ\bar y' and vv' are the sample mean and sample variance of {yi}\{ y_i' \} in the nn patients randomized to SOC. Using standard arguments gives the power of the ZZ-test

Pr(Z>cαΔ)Φ(nΔvar(yi)+var(yi)cα)        (4.5)\text{Pr} (Z > c_{\alpha} | \Delta) \approx \Phi \left( \frac{\sqrt{n} \Delta} { \sqrt{ \text{var}(y_i^*) + \text{var}(y_i')}} - c_{\alpha} \right) \ \ \ \ \ \ \ \ \text{(4.5)}

where cαc_{\alpha} is the upper α\alphath percentile of standard normal. In Appendix C, we derive the expressions for Δ\Delta, var(yi)(y_i^*), and var(yi)(y_i') in (4.5) under the condition that τi2τ2\tau_i^2 \equiv \tau^2 for all ii. This condition is met when the N-of-1 trial patients receive the same sequence xitx_{it} in the experimentation phase or when {ϵit}\{ \epsilon_{it} \} has a specific variance-covariance structure. For example, under a compound symmetry, that is, ρstρ\rho_{st} \equiv \rho, we can show λiλ=1ρ\lambda_i \equiv \lambda = 1 - \rho, that is, having τi2(1ρ)σ2/m\tau_i^2 \equiv (1- \rho) \sigma^2 /m for all ii. Specifically, under the assumption that the SOC treatment xix_i' for a given patient is independent of the patient-specific treatment effect βi\beta_i, we have

Δ=2μB{Φ(μBσB2+τi2)p1}+2σB2σB2+τi2ϕ(μBσB2+τi2),        (4.6)\Delta = 2 \mu_B \left\{ \Phi \left( \frac{\mu_B}{\sqrt{\sigma_B^2 + \tau_i^2}} \right) - p_1 \right\} + \frac{2 \sigma_B^2}{\sqrt{ \sigma_B^2 + \tau_i^2}} \phi \left( \frac{\mu_B}{\sqrt{\sigma_B^2 + \tau_i^2}} \right), \ \ \ \ \ \ \ \ \text{(4.6)}
var(yi)=σA2+σB2+μB2{Δ+μB(2p11)}2+σ2Tm,                     (4.7)\text{var}(y_i^*) = \sigma_A^2 + \sigma_B^2 + \mu_B^2 - \{ \Delta + \mu_B (2p_1-1) \}^2 + \frac{\sigma^2}{T-m}, \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \text{(4.7)}

and

var(yi)=σA2+σB2+μB2μB2(2p11)2+σ2Tm.                                  (4.8)\text{var}(y_i') = \sigma_A^2 + \sigma_B^2 + \mu_B^2 - \mu_B^2 (2 p_1 - 1)^2 + \frac{\sigma^2}{T-m}. \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \text{(4.8)}

The above expressions account for population-level information about the treatments through the program parameter p1p_1. For example, if emerging evidence in the literature suggests slight advantage of xi=1x_i=1 over xi=1x_i = -1, we may assume the physicians in the program will select xi=1x_i'=1 with p1>0.5p_1 > 0.5. In Appendix C, we provide expressions analogous to (4.6)–(4.8) for the situations where the physician may prescribe treatment xix_i' with patient-specific knowledge in addition to the population-level parameter p1p_1. However, we note that using expressions (4.6)–(4.8) may adequately reflect the standard of care where treatments are chosen based on population-level information rather than patient-specific knowledge. Furthermore, under the independence assumption of xix_i' and βi\beta_i, the power expression depends only on model parameters (σA,σB,μB,σ,τi)(\sigma_A, \sigma_B, \mu_B, \sigma, \tau_i) for which information may be available to provide preliminary estimates and the known design parameters (p1,m,n,T)(p_1,m,n,T). Finally, under the null case μB=0\mu_B=0:

Main Result 2. All else being equal, the power to demonstrate quality improvement due to N-of-1 trials (vs SOC) increases as heterogeneity of treatment effects σB2\sigma_B^2 increases.

5. Numerical Illustrations: Application to ALS Patients

5.1. Optimal Length of Experimentation

We use the MCS natural history data in Mitsumoto et al. (2019) to inform the design of the evaluation program for people with ALS. Specifically, we fitted a random effects model to the data and obtained an estimate of σA=4.8\sigma_A=4.8 and σ=1.6\sigma = 1.6. For simplicity in illustration, we further assume that the within-subject noise is conditionally independent given the population-level parameters. Figure 2 plots the patient-centric criteria against different values of μB,σB,\mu_B, \sigma_B, and mm for T=18T=18. While the two criteria adopt different metrics, they are maximized when mm is relatively small. In Figure 2 and in all (μB,σB)(\mu_B, \sigma_B) that we have considered (not shown here), the optimal values of mm range from 2 to 6 for both criteria. This is consistent with what Main Result 1 implies: mT/3=6m^* \lesssim T/3 = 6.

Figure 2. Patient-centric criteria vs experimentation length m under different values of μB and σB. Left: Expected number of optimal treatment periods vs m. Right: Expected average outcome (negative of MCS) of patient vs m.

5.2. Sample Size and Effect Size

Main Result 2 implies that σB2\sigma_B^2 may be viewed as an effect size in power calculation, while the power also depends on other model parameters and design parameters. As in conventional practice, the choice of an effect size should be based on a clinically meaningful difference, whereas the other model parameters (e.g., σA,σ\sigma_A, \sigma, etc) may be based on pilot data if available. Figure 3 plots the power against (n,m)(n,m) for three different effect sizes σB\sigma_B for a one-sided test at 5% significance. Under each effect size, we identify the smallest nn that achieves 80% power for any mm and obtain that the required (n,m)(n,m) are (210,12)(210,12), (60,6)(60,6), and (34,4)(34,4) respectively for σB=1.6,3.2,\sigma_B = 1.6,3.2, and 4.8. We note that under a small effect size σB=1.6\sigma_B=1.6, the required m=12m = 12 is greater than T/3T/3. In light of Main Result 1, we may instead adopt (n,m)=(210,6)(n,m) = (210,6) in order to maximize the benefits of the N-of-1 trials to the patients. The power of this modified design is 78%, which is slightly lower than the target 80%. Generally, we observe from Figure 3 that the impact of mm on power is relatively small compared to that of nn and σB\sigma_B except when the effect size is small (σB=1.6\sigma_B = 1.6).

To determine if a specific value of σB\sigma_B corresponds to a clinically meaningful effect size, relating σB\sigma_B to Δ\Delta using (4.6) may be useful, as Δ\Delta lives on the same scale as the measurement outcomes. In our application, a 3- to 4-point change on the MCS will represent a clinical meaningful shift. Based on the pilot data and assumptions, the effect size σB=1.6,3.2,4.8\sigma_B = 1.6, 3.2, 4.8 translate to a degree of quality improvement Δ=1.2,2.5,3.8\Delta = 1.2, 2.5, 3.8 respectively. Thus, we set the sample size for this evaluation program at n=34n=34 with four treatment periods (two on mexiletine and two on baclofen) based on the results for σB=4.8\sigma_B = 4.8. Generally, the minimally clinically important heterogeneity (MCIH σB,min\sigma_{B,\min}) may be determined relating to the minimally clinically important change (MCID, Δmin\Delta_{\min}) using (4.6).

Figure 3. Power vs (n,m) for different values of σB with μB = 0, σA = 4.8, σ = 1.6, ρ = 0, and T = 18.

5.3. Power for Comparing to Fully Informed SOC

The calculations in the previous subsection assume the null case μB=0\mu_B = 0 under which the power (4.5) does not depend on the parameter p1p_1. Under a non-null case, the value of p1p_1 reflects how informed the practice is about the population-level treatment effect. For example, if evidence in the literature suggests μB>0\mu_B>0, an informed practice will prescribe xi=1x_i'=1 with p1>0.5p_1 > 0.5. Specifically, standard of care that is fully informed by the literature may correspond to p1=Pr(βi>0)=Φ(μB/σB)p_1 = \text{Pr}(\beta_i > 0) = \Phi(\mu_B/\sigma_B).

Table 1 shows that as the average treatment effect μB\mu_B grows larger and a fully informed SOC practice prescribes xi=1x_i'=1 more often (i.e., larger p1p_1), the power to demonstrate quality improvement in (4.3) becomes smaller. On the one hand, this suggests that if there is overwhelming evidence favoring xi=1x_i'=1 over xi=1x_i' = -1 in the literature, conducting N-of-1 trials will have diminished effect provided that the standard of care is fully informed. On the other hand, even with a large average treatment effect μB=2.4=0.5σB\mu_B = 2.4 = 0.5 \sigma_B, quality improvement due to N-of-1 trials Δ>3\Delta > 3, which is still clinically meaningful and the power is still reasonable high (68%) in this sensitivity analysis. This suggests that evaluating N-of-1 trials is a worthwhile endeavor unless there is overwhelming evidence of a large average treatment effect.

Table 1. Quality improvement Δ\Delta and power for comparing to a fully informed SOC (standard of care) with p1=Φ(μB/σBp_1 = \Phi(\mu_B/\sigma_B) with n=34,n=34, m=4,m=4, T=18,T=18, σA=4.8,\sigma_A = 4.8, σB=4.8,\sigma_B = 4.8, σ=1.6\sigma=1.6 and ρ=0\rho=0.

μB\mu_B

p1=Φ(μB/σB)p_1 = \Phi(\mu_B/\sigma_B)

Δ\Delta

Power

0

0.50

3.8

80%

1.2

0.60

3.7

77%

1.6

0.63

3.6

75%

2.4

0.69

3.3

68%

4.8

0.84

2.3

39%

The numerical results in this article, and power and the patient-centric criteria in general, can be computed using tools are available at: https://roadmap2health.io/hdsr/n1power/.

6. Discussion

N-of-1 trials have been increasingly used as a design tool to bridge practice and science in rare diseases (Müller et al., 2021; Stunnenberg et al., 2018). However, the literature is missing concrete guidelines on N-of-1 designs as to how much experimentation is appropriate. A fundamental issue is the articulation of a framework that will facilitate the evaluation of the usefulness of N-of-1 trials. In this article, we introduce an evaluation framework and outline the basic elements in an evaluation program for N-of-1 trials—namely, an experimentation phase, a feedback period, and a validation phase. In the literature, the reporting of N-of-1 trials mostly focuses only on the results of the experimentation phase, where patients explore the different treatments sequentially under a rigorous clinical protocol such as randomization, blinding, and scheduled follow-up. The feedback period and the validation phase are the critical elements in the planning and the conduct of N-of-1 trials but are, unfortunately, often omitted in the description of the design and the analytical plan.

Specifically, the length of the validation phase, relative to that of the experimentation phase, should be given careful consideration. We have demonstrated theoretically and numerically that the optimal length of experimentation from the patient’s perspective should be no greater than one-third of the entire study duration. This implies a relatively long validation phase, suggesting the importance of reproducing the quality of the decisions due to N-of-1 trials with additional follow-up. Our theoretical results also provide guidance on how many patients are needed in order to adequately power for testing quality improvement. Importantly, the relative length of experimentation and validation has minimal impact on the power. In other words, little conflict exists between the goal of maximizing patient benefits and maximizing power.

The feedback period facilitates evidence-based treatment decisions using data measured in the experimentation phase. Summarizing the relative benefits of the treatments via a single numerical statistic is a pragmatic way to present such evidence, because the information can be objectively presented and quickly digested by stakeholders. We have developed design calculus based on the model-based least squares estimation, which is quick to compute and produces unbiased estimates of patient-specific treatment effects under a broad range of scenarios. Other more sophisticated model-based methods may be used to deal with the more complex situations. For example, when we observe high volume of outcome measures via wearable devices, we could extend model (3.2) to an autoregressive model with multiple observations per treatment period (Kronish et al., 2019). In practice, treatment decisions are likely determined based on the totality of evidence. For example, in situations where a treatment that apparently benefits a patient may have side effects, a possibly less effective treatment may be preferred if it is more easily tolerated. Considerations of multiple outcomes in the analysis during the feedback period will likely increase adherence and will warrant further empirical, domain-specific research. Overall, as the feedback period potentially changes the treatment decisions—and hence, the outcomes—in the validation phase, it can be viewed as an integral part of the intervention component. We may thus experiment in a randomized fashion different elements in the feedback period for different individuals: we may consider presenting different endpoints (e.g., muscle cramp or safety), using a single endpoint, a composite outcome, or multivariate endpoints, using different types of analyses (e.g., intent-to-treat vs per-protocol), and asking patients for their satisfaction and preference (Cheung et al., 2020).

Some considerations, assumptions, and limitations for power calculation in conventional randomized controlled trials also apply for N-of-1 trials. First, power calculation involves the inputs of a number of nuisance model parameters (e.g., σA,σ\sigma_A, \sigma) as well as the effect size (σB2\sigma_B^2). While the effect size σB2\sigma_B^2 should be determined based on clinical relevance shift, the other parameters ideally can be based on estimates from pilot data. However, in situations where robust pilot data are not available, a potential useful strategy is to leverage the concepts of adaptive designs (U.S. Food & Drug Administration, 2019) whereby the model parameters are updated using interim data in the evaluation program and the updates in turn inform a reassessment of the degree of quality improvement and the sample size required.

Second, our derivations assume that patients in both arms comply with their treatments in the following sense: patients in the N-of-1 trials adhere to the estimated optimal treatments based on the experimentation phase data, and patients in the SOC continue with the same treatment as in the experimentation phase. If there is prior information about noncompliance rate, power expressions can be derived accordingly under the proposed framework. However, from the viewpoint that the feedback period is part of the N-of-1 trial intervention, it should be designed to maximize adherence by choosing the outcomes and analyses that most reflect patient preference as discussed in the previous paragraph. Third, approaches to deal with missing data should be prespecified and implemented during the feedback period. An advantage of using model-based estimation is that the model can also serve as the basis for multiple imputations. That being said, no statistical approach can replace a well-conducted trial that is characterized by good compliance to treatment and minimal missing data.


Disclosure Statement

This work was supported by grants R01LM012836 from the NIH/NLM, P30AG063786 from the NIH/NIA, UL1TR001873 from NIH/NCATS, and R01MH109496 from NIH/NIMH. Dr. Mitsumoto’s work was also supported by ALS Association, MDA Wings Over Wall Street, Spastic Paraplegia Foundation, Mitsubishi-Tanabe, and Tsumura. The funders had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; or decision to submit the manuscript for publication. The views expressed in this paper are those of the authors and do not represent the views of the National Institutes of Health, the U.S. Department of Health and Human Services, or any other government entity.


References

Al-Chalabi, A., & Hardiman, O. (2013). The epidemiology of ALS: A conspiracy of genes, environment and time. Nature Reviews Neurology, 9(11), 617–628. https://doi.org/10.1038/nrneurol.2013.203

Baldinger, R., Katzberg, H. D., & Weber, M. (2012). Treatment for cramps in amyotrophic lateral sclerosis/motor neuron disease. Cochrane Database of Systematic Reviews, Article CD004157. https://doi.org/10.1002/14651858.CD004157.pub2

Caress, J. B., Ciarlone, S. L., Sullivan, E. A., Griffin, L. P., & Cartwright, M. S. (2016). Natural history of muscle cramps in amyotrophic lateral sclerosis. Muscle & Nerve, 53(4), 513–517. https://doi.org/10.1002/mus.24892

Cheung, K., Wood, D., Zhang, K., Ridenour, T. A., Derby, L., St Onge, T., Duan, N., Duer-Hefele, J., Davidson, K. W., Kronish, I. M., & Moise, N. (2020). Personal preferences for personalized trials among patients with chronic experience: An empirical Bayesian analysis of a conjoint survey. BMJ Open, 10(6), Article e036056. https://doi.org/10.1136/bmjopen-2019-036056

Davidson, K. W., Silverstein, M., Cheung, K., Paluch, R. A., & Epstein, L. H. (2021). Experimental designs to optimize treatments for individuals: Personalized N-of-1 trials. JAMA Pediatrics, 175(4), 404–409. https://doi.org/10.1001/jamapediatrics.2020.5801

Edaravone [MCI-186] ALS 19 Study Group. (2017). Safety and efficacy of edaravone in well defined patients with amyotrophic lateral sclerosis: A randomised, double-blind, placebo-controlled trial. Lancet Neurology, 16(7), 505–512. https://doi.org/10.1016/s1474-4422(17)30115-1

Hogben, L., & Sim, M. (1953). The self-controlled and self-recorded clinical trial for low-grade morbidity. British Journal of Preventive and Social Medicine, 7(4), 163–179. https://doi.org/10.1136/jech.7.4.163

Kravitz, R. L., Duan, N. (Eds), and the DEcIDE Methods Center N-of-1 Guidance Panel (Duan, N., Eslick, I., Gabler, N. B., Kaplan, H. C., Kravitz, R. L., Larson, E. B., Pace, W. D., Schmid, C. H., Sim, I., & Vohra, S.) (2014). Design and implementation of N-of-1 trials: A user’s guide. Agency for Healthcare Research and Quality. https://effectivehealthcare.ahrq.gov/products/n-1-trials/research-2014-5

Kravitz, R. L., Schmid, C. H., Marois, M., Wilsey, B., Ward, D., Hays, R. D., Duan, N., Wang, Y., MacDonald, S., Jerant, A., Servadio, J. L., Haddad, D., & Sim, I. (2018). Effect of mobile device-supported single-patient multi-crossover trials on treatment of chronic musculoskeletal pain: A randomized clinical trial. JAMA Internal Medicine, 178(10), 1368–1378. https://doi.org/10.1001/jamainternmed.2018.3981

Kronish, I. M., Cheung, Y. K., Shimbo, D., Julian, J., Gallagher, B., Parsons, F., & Davidson, K. W. (2019). Increasing the precision of hypertension treatment through personalized trials: A pilot study. Journal of General Internal Medicine, 34(6), 839–845. https://doi.org/10.1007/s11606-019-04831-z

Mitsumoto, H., Brooks, B. R., & Silani, V. (2014). Clinical trials in amyotrophic lateral sclerosis: Why so many negative trials and how can trials be improved? Lancet Neurology, 13(11), 1127–1138. https://doi.org/10.1016/s1474-4422(14)70129-2

Mitsumoto, H., Chiuzan, C., Gilmore, M., Zhang, Y., Ibagon, C., McHale, B., Hupf, J., & Oskarsson, B. (2019). A novel muscle cramp scale (MCS) in amyotrophic lateral sclerosis (ALS). Amyotrophic Lateral Sclerosis and Frontotemporal Degeneration, 20(5–6), 328–335. https://doi.org/10.1080/21678421.2019.1603310

Müller, A. R., Brands, M. M. M. G., van de Ven, P. M., Roes, K. C. B., Cornel, M. C., van Karnebeek, C. D. M., Wijburg, F. A., Daams, J. G., Boot, E., & van Eeghen, A. M. (2021). The power of 1: Systematic review of N-of-1 studies in rare genetic neurodevelopmental disorders. Neurology, 96(11), 529–540. https://doi.org/10.1212/WNL.0000000000011597

Roustit, M., Giai, J., Gaget, O., Khouri, C., Mouhib, M., Lotito, A., Blaise, S., Seinturier, C., Subtil, F., Paris, A., Cracowski, C., Imbert, B., Carpentier, P., Vohra, S., & Cracowski, J.-L. (2018). On-demand sildenafil as a treatment for raynaud phenomenon: A series of N-of-1 trials. Annals of Internal Medicine, 169(10), 694–703. https://doi.org/10.7326/m18-0517

Samuel, J. P., Tyson, J. E., Green, C., Bell, C. S., Pedroza, C., Molony, D., & Samuels, J. (2019). Treating hypertension in children with n-of-1 trials. Pediatrics, 143(4), Article e20181818. https://doi.org/10.1542/peds.2018-1818

Stunnenberg, B., Raaphorst, J., Groenewoud, H., Statland, J., Griggs, R., Woertman, W., Stegeman, D., Timmermans, J., Trivedi, J., Matthews, E., Saris, C., Schouwenberg, B., Drost, G., van Engelen, B., & van der Wilt, G. (2018). A series of aggregated randomized-controlled N-of-1 trials with mexiletine in non-dystrophic myotonia: Clinical trial results and validation of rare disease design (p3.440) [70th Annual Meeting of the American-Academy-of-Neurology (AAN) ; Conference date: 21-04-2018 through 27-04-2018]. Neurology, 90(15 Suppl). https://n.neurology.org/content/90/15_Supplement/P3.440

U.S. Food & Drug Administration. (2019). Adaptive designs for clinical trials of drugs and biologics: Guidance for Industry. https://www.fda.gov/media/78495/download

van den Berg, L. H., Sorenson, E., Gronseth, G., Macklin, E. A., Andrews, J., Baloh, R. H., Benatar, M., Berry, J. D., Chio, A., Corcia, P., Genge, A., Gubitz, A. K., Lomen-Hoerth, C., McDermott, C. J., Pioro, E. P., Rosenfeld, J., Silani, V., Turner, M. R., Weber, M., . . . Mitsumoto, H. (2019). Revised Airlie House consensus guidelines for design and implementation of ALS clinical trials. Neurology, 92(14), e1610–e1623. https://doi.org/10.1212/wnl.0000000000007242


Appendices

Appendix A. Theoretical Results Concerning E(zi)E(z_i)

A.1. Proof of Proposition 1.

First, consider the case βi>0\beta_i > 0 for patient ii so the optimal treatment is xit=1x_{it} =1. Then, the number of periods the patient is on the optimal treatment equals

zi=m2+(Tm)I(xi=1)=m2+(Tm)I(β^i>0).        (A.1)z_i = \frac{m}{2} + (T-m) I (x_i^* = 1) = \frac{m}{2} + (T-m) I (\hat \beta_i > 0 ). \ \ \ \ \ \ \ \ \text{(A.1)}

The first term in the right-hand-side of (A.1) is the number of optimal treatment periods received in the experimental phase, and the second term is the number in the validation phase. Since β^iN(βi,τi2)\hat \beta_i \sim N( \beta_i, \tau_i^2), we have

E{I(β^i>0)αi,βi}=Pr(β^i>0αi,βi)=Φ(βi/τi),E \left\{ I( \hat \beta_i >0) | \alpha_i, \beta_i \right\} = \text{Pr} \left( \hat \beta_i >0 | \alpha_i, \beta_i \right) = \Phi \left( \beta_i/\tau_i \right), %E \left\{ I( \hat \beta_i >0) | \alpha_i, \beta_i \right\} = \text{Pr} \left( \hat \beta_i >0 | \alpha_i, \beta_i \right) = %\Phi \left( {\sqrt{m} \beta_i}/{\sigma} \right),

and therefore,

E(ziαi,βi)=m2+(Tm)Φ(βi/τi) when βi>0.        (A.2)E (z_i | \alpha_i, \beta_i) = \frac{m}{2} + (T-m) \Phi (\beta_i / \tau_i) \text{ when } \beta_i > 0. \ \ \ \ \ \ \ \ \text{(A.2)}

Next, under the case βi<0\beta_i <0, we can analogously derive that

E(ziαi,βi)=m2+(Tm)Φ(βi/τi) when βi<0.        (A.3)E(z_i | \alpha_i, \beta_i) = \frac{m}{2} + (T-m) \Phi (-\beta_i / \tau_i ) \text{ when } \beta_i < 0. \ \ \ \ \ \ \ \ \text{(A.3)}

Combining (A.2) and (A.3) gives

E(ziαi,βi)=E(ziβi)=m2+(Tm)Φ(βi/τi),        (A.4)E(z_i | \alpha_i, \beta_i) = E(z_i | \beta_i) = \frac{m}{2} + (T-m) \Phi ( | \beta_i | / \tau_i ), %E(z_i | \alpha_i, \beta_i) = E(z_i | \beta_i) = \frac{m}{2} + (T-m) \Phi ( \frac{\sqrt{m} |\beta_i|}{\sigma} ), \ \ \ \ \ \ \ \ \text{(A.4)}

which is free of αi\alpha_i. Since E(zi)=E{E(ziβi)}E(z_i) = E \{ E(z_i | \beta_i) \}, the expectations of both sides in (A.4) are to be taken with respect to the distribution of βiN(μB,σB2)\beta_i \sim N(\mu_B, \sigma_B^2) to complete the proof. By change of variable, we have

E Φ(βi/τi)=bτi1σBϕ(w)ϕ(bμBσB)dwdb     =μB+σBuτiϕ(w)ϕ(u)dwdu=Pr(WμB+σBUτi).   (A.5)\begin{aligned} E \ {\Phi ( |\beta_i |/ \tau_i )} &= \int_{-\infty}^{\infty} \int_{-\infty}^{ \frac{|b|}{\tau_i}} \frac{1}{\sigma_B} \phi(w) \phi ( \frac{b-\mu_B}{\sigma_B} ) dw db \\ \ \ \ \ \ &= \int_{-\infty}^{\infty} \int_{-\infty}^{ \frac{|\mu_B + \sigma_B u |}{\tau_i}} \phi(w) \phi (u) dw du \\ &= \text{Pr} ( W \leq \frac{ | \mu_B + \sigma_B U | }{\tau_i} ). \end{aligned} \ \ \ \text{(A.5)}

The proof is completed by substituting (A.5) into (A.4).

Appendix B. Theoretical Results Concerning E(yˉi)E(\bar y_i)

B.1. Lemma 1

Derivations of E(yˉi)E(\bar y_i) will be facilitated by first noting the following lemma:

Lemma 1.  Let VN(μV,σV2)V \sim N(\mu_V, \sigma_V^2). Then,

E{VΦ(V)}=μVΦ(μVσV2+1)+σV2σV2+1ϕ(μVσV2+1)E\left\{ V \Phi(V) \right\} = \mu_V \Phi \left( \frac{\mu_V}{ \sqrt{ \sigma_V^2 + 1}} \right) + \frac{\sigma_V^2}{\sqrt{\sigma_V^2 +1 }} \phi \left( \frac{\mu_V}{ \sqrt{ \sigma_V^2 + 1}} \right)

where Φ\Phi and ϕ\phi respectively denote the standard normal distribution function and density function.

Proof of Lemma 1:  Using definition of expectation, we derive

E{VΦ(V)}=vvϕ(u)1σVϕ(vμVσV)dudv=μV+σVw(μV+σVw)ϕ(u)ϕ(w)dudw=μVPr(U<μV+σVW)+σVwΦ(μV+σVw)ϕ(w)dw=μVΦ(μV1+σV2)+σV2ϕ(w)ϕ(μV+σVw)dw     (B.1)\begin{aligned} E \{ V \Phi(V) \} &= \int_{-\infty}^{\infty} \int_{-\infty}^v v \phi(u) \frac{1}{\sigma_V} \phi ( \frac{v-\mu_V}{\sigma_V} ) du dv \\ &= \int_{-\infty}^{\infty} \int_{-\infty}^{\mu_V + \sigma_V w} (\mu_V + \sigma_V w) \phi(u) \phi (w) du dw \\ &= \mu_V \text{Pr}(U < \mu_V + \sigma_V W) + \sigma_V \int_{-\infty}^{\infty} w \Phi( \mu_V + \sigma_V w ) \phi (w) dw \\ &= \mu_V \Phi ( \frac{\mu_V}{\sqrt{1 + \sigma_V^2}} ) + \sigma_V^2 \int_{-\infty}^{\infty} \phi(w) \phi ( \mu_V + \sigma_V w ) dw \end{aligned} \ \ \ \ \ \text{(B.1)}

where U,WU,W are independent standard normal variables. Thus, UσVWN(0,1+σV2)U - \sigma_V W \sim N(0, 1+ \sigma_V^2), and the first term in (B.1) can be evaluated as

μVPr(U<μV+σVW)=μVΦ(μV1+σV2).           (B.2)\mu_V \text{Pr}(U < \mu_V + \sigma_V W) = \mu_V \Phi \left( \frac{\mu_V}{\sqrt{1 + \sigma_V^2}} \right). \ \ \ \ \ \ \ \ \ \ \ \text{(B.2)}

Next, the single integral in the second term in (B.2) can be evaluated using integration by parts

wΦ(μV+σVw)ϕ(w)dw=σVϕ(μV+σVw)ϕ(w)dw=σV1σV2+1ϕ(μVσV2+1).    (B.3)\begin{aligned} \int_{-\infty}^{\infty} w \Phi\left( \mu_V + \sigma_V w \right) \phi (w) dw &= \sigma_V \int_{-\infty}^{\infty} \phi\left( \mu_V + \sigma_V w \right) \phi (w) dw \\ &= \sigma_V \frac{1}{\sqrt{\sigma_V^2+1} } \phi \left( \frac{\mu_V}{\sqrt{\sigma_V^2 +1}}\right). \end{aligned} \ \ \ \ \text{(B.3)}

Equation (B.3) can be derived by straightforward derivation. The proof of Lemma 1 is thus completed by plugging (B.2) and (B.3) into (B.1).

B.2. Proof of Proposition 2

Recall that yˉi\bar y_i denotes the average outcome of patient ii in all TT treatment periods in an N-of-1 trial. Hence,

E(yˉi)=1Tt=1TE(αi+βixit+ϵit)=μA+1Tt=1TE(βixit)=μA+(1mT)E(βixi).     (B.4)\begin{aligned} E(\bar y_i) &= \frac{1}{T} \sum_{t=1}^T E(\alpha_i + \beta_i x_{it} + \epsilon_{it} ) = \mu_A + \frac{1}{T} \sum_{t=1}^T E(\beta_i x_{it}) \\ &= \mu_A + \left(1 - \frac{m}{T} \right) E(\beta_i x_i^*). \end{aligned} \ \ \ \ \ \text{(B.4)}

Equation (B.4) holds as because of balanced design i=1mxit=0\sum_{i=1}^m x_{it} = 0. Next, since β^iN(βi,τi2)\hat \beta_i \sim N(\beta_i, \tau_i^2), we have

E(βixi)=E{βiE(xiβi)}=E[βiE{2I(β^i>0)1βi}]=E{2βiΦ(βiτi)βi}=2E{βiΦ(mβiσ)}μB=2τiE{βiτiΦ(βiτi)}μB=μB{2Φ(μBσB2+τi2)1}+2σB2σB2+τi2ϕ(μBσB2+τi2).     (B.5)\begin{aligned} E( \beta_i x_i^* ) &= E \left\{ \beta_i E( x_i^* | \beta_i) \right\} = E \left[ \beta_i E \left\{ \left. 2 I(\hat \beta_i >0) - 1 \right| \beta_i \right\} \right] = E \left\{ 2 \beta_i \Phi \left( \frac{\beta_i}{\tau_i} \right) - \beta_i \right\} \\ &= 2 E \left\{ \beta_i \Phi \left( \frac{\sqrt{m} \beta_i}{\sigma} \right) \right\} - \mu_B \\ &= 2 \tau_i E \left\{ \frac{ \beta_i}{\tau_i} \Phi \left( \frac{\beta_i}{\tau_i} \right) \right\} - \mu_B \\ &=\mu_B \left\{ 2 \Phi \left( \frac{\mu_B}{\sqrt{\sigma_B^2 + \tau_i^2}} \right) - 1 \right\} + \frac{2 \sigma_B^2}{\sqrt{ \sigma_B^2 + \tau_i^2}} \phi \left( \frac{\mu_B}{\sqrt{\sigma_B^2 + \tau_i^2}} \right). \end{aligned} \ \ \ \ \ \text{(B.5)}

Expression (B.5) is obtained by applying Lemma 1 with V=βi/τiV = \beta_i / \tau_i. Putting (B.5) into (B.4) gives

E(yˉi)=μA+(1mT)[μB{2Φ(μBσB2+τi2)1}+2σB2σB2+τi2ϕ(μBσB2+τi2)]     (B.6)E(\bar y_i) = \mu_A + \left(1 - \frac{m}{T} \right) \left[ \mu_B \left\{ 2 \Phi \left( \frac{\mu_B}{\sqrt{\sigma_B^2 + \tau_i^2}} \right) - 1 \right\} + \frac{2 \sigma_B^2}{\sqrt{ \sigma_B^2 + \tau_i^2}} \phi \left( \frac{\mu_B}{\sqrt{\sigma_B^2 + \tau_i^2}} \right) \right] \ \ \ \ \ \text{(B.6)}

thus completing the proof of Proposition 2.

B.3. Derivation of optimal experimentation length mm^* and Main Result 1

For least squares β^iLS\hat \beta_i^{LS}, the variance τi2=λiσ2/m\tau_i^2 = \lambda_i \sigma^2 / m, where λi=1+stxisxitρst/m\lambda_i = 1 + \sum_{s \neq t} x_{is} x_{it} \rho_{st} / m. Further supposing μB=0\mu_B =0 simplifies (B.6) to

E(yˉi)=μA+(1mT)2σB2σB2+λiσ2/mϕ(0).E ( \bar y_i) = \mu_A + \left(1 - \frac{m}{T} \right) \frac{2 \sigma_B^2}{\sqrt{\sigma_B^2 + \lambda_i \sigma^2/m}} \phi (0). %= %\mu_A + \left(1 - \frac{m}{T} \right) \frac{2 \sigma_B \rho_1}{\sqrt{\rho_1^2 + 1/m}} \phi (0).

Hence, maximizing E(yˉi)E(\bar y_i) as a function of mm is equivalent to maximizing the function

h(m)=(1mT)1ξi+1/mh(m) = \left(1 - \frac{m}{T} \right) \frac{1}{ \sqrt{\xi_i + 1/m}}

where ξi=σB2/(λiσ2)\xi_i = \sigma_B^2 / (\lambda_i \sigma^2) is free of mm. Using standard calculus arguments, we can show that the maximizer mm^* of h(m)h(m) solves the equation 2ξim2+3mT=02 \xi_i m^{*2} + 3 m^* - T = 0 or equivalently,

m=9+8ξiT34ξi.          (B.7)m^* = \frac{\sqrt{ 9 + 8 \xi_i T} - 3}{4 \xi_i }. \ \ \ \ \ \ \ \ \ \ \text{(B.7)}

The derivation of mm^* is completed by multiplying 9+8ξi+3\sqrt{9 + 8 \xi_i} + 3 in the numerator and the denominator of (B.7), which gives

m=2T9+8ξiT+3.m^* = \frac{ 2T } { \sqrt{9 + 8 \xi_i T} + 3}.

Now, since ξi0\xi_i \geq 0, we have m2T/9+3=T/3m^* \leq 2T / \sqrt{9} + 3 = T/3. As a practical note, due to discreteness in mm, the optimal mm may be a result of rounding up mm^*. Hence a slightly less sharp inequality would be mT/3<T/3+1m^* \lesssim T/3 < T/3 +1.

Appendix C. Theoretical Results Concerning Power

In this section, we derive the expressions involved in the power of the ZZ-test—namely, Δ\Delta, var(yi)(y_i^*), and var(yi)(y_i').

Recall that p0p_0 and p1p_1 respectively denote the probabilities that the treating physicians will prescribe mexiletine (xit=1)(x_{it}=-1) and baclofen (xit=1)(x_{it}=1) under the treatment program. Based on model (3.2), we can express the quality of an N-of-1 trial as:

E(yi)=1Tmt=m+1TE(αi+βixit+ϵit)=1Tmt=m+1TE(αi+βixi+ϵit)=μA+E(βixi).\begin{aligned} E(y_i^*) &=& \frac{1}{T-m} \sum_{t=m+1}^T E(\alpha_i + \beta_i x_{it} + \epsilon_{it} ) &= \frac{1}{T-m} \sum_{t=m+1}^T E(\alpha_i + \beta_i x_i^* + \epsilon_{it} ) &= \mu_A + E(\beta_i x_i^*). \end{aligned}

and analogously E(yi)=μA+E(βixi)E(y_i') = \mu_A + E(\beta_i x_i') where xix_i' is the treatment given to patient ii in SOC. Hence Δ=E(βixi)E(βixi)\Delta = E(\beta_i x_i^*) - E(\beta_i x_i'). Under the independence assumption of xix_i' and βi\beta_i, we further obtain E(yi)=μA+(2p11)μBE(y_i' ) = \mu_A + (2p_1-1) \mu_B, and

Δ=E(βixi)μB(2p11)=μB{2Φ(μBσB2+τi2)1}+2σB2σB2+τi2ϕ(μBσB2+τi2)μB(2p11)=2μB{Φ(μBσB2+τi2)p1}+2σB2σB2+τi2ϕ(μBσB2+τi2)    (C.1)\begin{aligned} \Delta &= E( \beta_i x_i^*) - \mu_B (2 p_1 - 1) \\ &= \mu_B \left\{ 2 \Phi \left( \frac{\mu_B}{\sqrt{\sigma_B^2 + \tau_i^2}} \right) - 1 \right\} + \frac{2 \sigma_B^2}{\sqrt{ \sigma_B^2 + \tau_i^2}} \phi \left( \frac{\mu_B}{\sqrt{\sigma_B^2 + \tau_i^2}} \right) - \mu_B (2 p_1 - 1) \\ &= 2 \mu_B \left\{ \Phi \left( \frac{\mu_B}{\sqrt{\sigma_B^2 + \tau_i^2}} \right) - p_1 \right\} + \frac{2 \sigma_B^2}{\sqrt{ \sigma_B^2 + \tau_i^2}} \phi \left( \frac{\mu_B}{\sqrt{\sigma_B^2 + \tau_i^2}} \right) \end{aligned} \ \ \ \ \text{(C.1)}

where E(βixi)E( \beta_i x_i^*) is given in (B.5).

Next,

var(yi)=var{αi+βixi+t=m+1Tϵit/(Tm)}=σA2+var(βixi)+σ2Tm =σA2+σB2+μB2{E(βixi)}2+σ2Tm=σA2+σB2+μB2{Δ+μB(2p11)}2+σ2Tm.:=σ2\begin{aligned} \text{var}(y_i^*) &= \text{var}\left\{ \alpha_i + \beta_i x_{i}^* + \sum_{t=m+1}^T \epsilon_{it} / (T-m) \right\} \\ &= \sigma_A^2 + \text{var}( \beta_i x_i^*) + \frac{\sigma^2}{T-m} \ = \sigma_A^2 + \sigma_B^2 + \mu_B^2 - \{ E(\beta_i x_{i}^*) \}^2 + \frac{\sigma^2}{T-m} \\ &= \sigma_A^2 + \sigma_B^2 + \mu_B^2 - \{ \Delta + \mu_B (2p_1-1) \}^2 + \frac{\sigma^2}{T-m}. := \sigma^{*2} \end{aligned}

The last equality is a result of (C.1). Similarly, we can show

var(yi)=σA2+σB2+μB2μB2(Exi)2+σ2Tm=σA2+σB2+μB2μB2(2p11)2+σ2Tm.\begin{aligned} \text{var}(y_i') &= \sigma_A^2 + \sigma_B^2 + \mu_B^2 - \mu_B^2 (Ex_i')^2 + \frac{\sigma^2}{T-m} \\ &= \sigma_A^2 + \sigma_B^2 + \mu_B^2 - \mu_B^2 (2 p_1 - 1)^2 + \frac{\sigma^2}{T-m}.\end{aligned}

Finally, under the null μB=0\mu_B=0, we have

Δvar(yi)+var(yi)=2σB2ϕ(0)σB2+τi22σA2+2σB24σB4ϕ2(0)σB2+τi2+2σ2/(Tm).\frac{\Delta} { \sqrt{ \text{var}(y_i^*) + \text{var}(y_i')}} = \frac{2 \sigma_B^2 \phi(0)}{\sqrt{ \sigma_B^2 + \tau_i^2} \sqrt{2 \sigma_A^2 + 2 \sigma_B^2 - \frac{4 \sigma_B^4 \phi^2(0)}{ \sigma_B^2 + \tau_i^2}+ 2 \sigma^2 / (T-m)} }.

Main Result 2 is proved by dividing σB2\sigma_B^2 on the numerator and the denominator in the above expression, as a result of which the numerator will be a constant and the denominator will be a decreasing function σB2\sigma_B^2.

For the situations where the physicians have patient-specific knowledge to inform treatments under the SOC, we may postulate that

xi={2I(βi>0)1with probability θC                     1with probability (1θC)p1           1with probability (1θC)(1p1).     (C.2)x_i' = \left\{ \begin{array}{cc} 2 I(\beta_i >0) - 1 & \text{with probability $\theta_C$} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \quad \ \ \ \\ 1 & \text{with probability $(1-\theta_C)p_1$} \ \ \ \ \ \ \ \ \ \ \ \\ -1 & \text{with probability $(1-\theta_C)(1-p_1)$}. \end{array} \right. \ \ \ \ \ \text{(C.2)}

The parameter θC\theta_C indicates how perfect the knowledge the physicians have about the specific best treatments for their patients, with θC=1\theta_C = 1 indicating perfect knowledge and θC=0\theta_C=0 indicating no additional knowledge beyond the population-level information p1p_1. Under the SOC treatment system (C.2), we have

E(βixi)=E{βiE(xiβi)}=2θCE{βiI(βi>0}θCμB+(1θC)μB(2p11)     (C.3)E(\beta_i x_i') = E \left\{ \beta_i E(x_i' | \beta_i) \right\} = 2 \theta_C E \left\{\beta_i I(\beta_i >0 \right\} - \theta_C \mu_B + (1-\theta_C)\mu_B (2p_1-1) \ \ \ \ \ \text{(C.3)}

where

E{βiI(βi>0}=σBϕ(μB/σB)+μBΦ(μB/σB).     (C.4)E \{\beta_i I(\beta_i >0 \} = \sigma_B \phi( \mu_B/\sigma_B ) + \mu_B \Phi( \mu_B/\sigma_B). \ \ \ \ \ \text{(C.4)}

Using (B.5), (C.3), and (C.4), after some algebra, we have

Δ=E(βixi)E(βixi)=2(1θC)μB{Φ(μBσB2+τi2)p1}+2(1θC)σB2σB2+τi2ϕ(μBσB2+τi2)+2θCμB{Φ(μBσB2+τi2)Φ(μBσB)}+2θC[σB2σB2+τi2ϕ(μBσB2+τi2)σBϕ(μBσB)]\begin{aligned} \Delta &= E( \beta_i x_i^*) - E( \beta_i x_i') \\ &= 2 (1-\theta_C) \mu_B \left\{ \Phi \left( \frac{\mu_B}{\sqrt{\sigma_B^2 + \tau_i^2}} \right) - p_1 \right\} + \frac{2 (1 - \theta_C) \sigma_B^2}{\sqrt{ \sigma_B^2 + \tau_i^2}} \phi \left( \frac{\mu_B}{\sqrt{\sigma_B^2 + \tau_i^2}} \right) + \\ & 2 \theta_C \mu_B \left\{ \Phi \left( \frac{\mu_B}{\sqrt{\sigma_B^2 + \tau_i^2}} \right) - \Phi\left( \frac{\mu_B}{\sigma_B} \right)\right\} + 2 \theta_C \left[ \frac{\sigma_B^2}{\sqrt{ \sigma_B^2 + \tau_i^2}} \phi \left( \frac{\mu_B}{\sqrt{\sigma_B^2 + \tau_i^2}} \right) - \sigma_B \phi \left(\frac{\mu_B}{\sigma_B}\right) \right]\end{aligned}