Evaluating Personalized (N-of-1) Trials in Rare Diseases: How Much Experimentation Is Enough?

For rare diseases, conducting large, randomized trials of new treatments can be infeasible due to limited sample size, and it may answer the wrong scientific questions due to heterogeneity of treatment effects. Personalized (N-of-1) trials are multi-period crossover studies that aim to estimate individual treatment effects, thereby identifying the optimal treatments for individuals. This article examines the statistical design issues of evaluating a personalized (N-of-1) treatment program in people with amyotrophic lateral sclerosis (ALS). We propose an evaluation framework based on an analytical model for longitudinal data observed in a personalized trial. Under this framework, we address two design parameters: length of experimentation in each trial and number of trials needed. For the former, we consider patient-centric design criteria that aim to maximize the benefits of enrolled patients. Using theoretical investigation and numerical studies, we demonstrate that, from a patient’s perspective, the duration of an experimentation period should be no longer than one-third of the entire follow-up period of the trial. For the latter, we provide analytical formulae to calculate the power for testing quality improvement due to personalized trials in a randomized evaluation program and hence determine the required number of trials needed for the program. We apply our theoretical results to design an evaluation program for ALS treatments informed by pilot data and show that the length of experimentation has a small impact on power relative to other factors such as the degree of heterogeneity of treatment effects.


Introduction
When managing chronic diseases and conditions, patients commonly try different treatments over time before finding the right treatments.The practice of N-of-1 trials operationalizes this type of patient-centered experimentation by randomizing treatments to single patients in multiple crossover periods, often in a balanced fashion.N-of-1 trials can be used to identify the optimal personalized treatment for single patients in situations involving evidence for heterogeneity of treatment effects (HTE) or the lack of a cure (Davidson et al. 2021).As such, these trials are sometimes called single-patient trials or personalized trials.First introduced by (Hogben and Sim 1953), N-of-1 trials have recently been applied to treat rare diseases (Roustit et al. 2018), as well as common chronic conditions such as hypertension (Kronish et al. 2019) (Samuel et al. 2019).The use of personalized (N-of-1) trials in treating rare diseases is particularly appealing because demonstrating comparative effectiveness of treatments at the population level via parallel-group randomized trials is often infeasible.
In this article, we consider personalized (N-of-1) trials of treatments for people with amyotrophic lateral sclerosis (ALS).ALS is a rare neurodegenerative disease that affects motor neurons in the brain and spinal cord.Despite the fact that two modestly effective disease-modifying medications have been approved for the treatment of ALS (Edaravone  ALS 19 Study Group 2017), the disease has no cure, and thus, symptomatic treatments remain an important strategy to improve the quality of life in people with ALS (Mitsumoto, Brooks, and Silani 2014).In particular, muscle cramps are disabling symptoms affecting over 90% of ALS patients, with demonstrated between-patient variability and yet stable manifestation of symptoms in a patient (Caress et al. 2016).Several treatments targeting muscle cramps have been evaluated and have shown mixed results, suggesting the presence of HTE or inadequate statistical power for definitive conclusions (Baldinger, Katzberg, and Weber 2012).Furthermore, ALS itself has been considered markedly heterogeneous in its pathogeneses, disease manifestations, and disease progression (Al-Chalabi and Hardiman 2013)(van den Berg et al. 2019).These are the clinical situations in which personalized (N-of-1) trials can help patients identify the best treatments for themselves (n.d.a).
Despite renewed interest in N-of-1 trials and numerous recent applications, the literature has offered little discussion on the evaluation of the usefulness of N-of-1 trials.As N-of-1 trials typically require active physician involvement, intense monitoring, and frequent data collection compared with usual care, these additional costs and resources warrant careful evaluation of effectiveness before said trials are used in practice as regular clinical service.The primary evaluation question is "Does the practice of N-of-1 trials in clinical care improve outcomes on the standard of care?"However, presuming the quality of treatment decisions based on N-of-1 trials is higher than what standard of care would prescribe, reports of N-of-1 trials often describe only the applications and results of the trials without plans to address the evaluation question.An exception is (Kravitz et al. 2018) who compare N-of-1 intervention against the usual care for patients with musculoskeletal pain in a randomized fashion using data collected after experimentation ends and find no evidence of superior outcomes among participants undergoing N-of-1 trials.However, when planning the study, the authors had not considered the underlying model that accounts for variability and

The Anatomy of a Personalized (N-of-1) Trial
We consider an evaluation program comparing the effectiveness of personalized (N-of-1) trials in treating muscle cramps in people with ALS relative to the institutional standard of care.Under the program, people with ALS will be randomized to receive personalized (N-of-1) trials that compare two standard drugs prescribed for muscle cramps, mexiletine and baclofen.In each trial, a patient will be given the two drugs sequentially over T = 18 two-week treatment periods in two phases.The first phase consists of m treatment periods (with m < T ) when the two drugs are randomized in a multiple crossover fashion.This phase shall be referred to as the experimentation phase.In the remaining T − m treatment periods, the patient will continue with a drug treatment selected based on data in the experimentation phase.This phase shall be referred to as the validation phase (Figure 1).
During the treatment periods, the Columbia Muscle Cramp Scale (MCS) will be collected weekly to result in two MCS measurements for each period: one at the end of week 1 and one at the end of week 2. The MCS is a validated, composite score summarizing the frequency, severity, and clinical relevance of cramps in people with ALS (Mitsumoto et al. 2019).While the study does not include washout periods between treatments, only the measurement at the end of each two-week period will be used in the primary analysis in order to avoid carryover effects of the drugs.
Sandwiched between the two treatment phases is a feedback period where the MCS data in the experimentation phase are reviewed with the treating physician and the patient.The feedback period enables data-driven treatment decisions by providing the stakeholders with data visualization as well as numerical comparison (Davidson et al. 2021).

Standard of Care
In this article, we focus on a randomized controlled evaluation program where patients are randomized between an N-of-1 trial and standard of care (SOC).As depicted in Figure 1, a patient under SOC will be given either mexiletine or baclofen for 36 weeks, corresponding to the 18 two-week treatment periods in the N-of-1 trials, and will have the same follow-up schedule as the N-of-1 trial patients.Treatments in the 'experimentation phase' will be determined by the treating physicians.The 'feedback period' in the SOC arm may be viewed as a sham intervention and be conducted as a regular clinic visit before the patient continues into the 'validation phase' with the same drug in the remaining T − m treatment periods.By the virtue of randomization, MCS collected in the validation phase under SOC will serve as the control data and allow for an unbiased comparison with the validation phase in the N-of-1 trial patients.
Let p 0 denote the probability that mexiletine will be prescribed under SOC and p 1 the probability baclofen will be prescribed such that p 0 + p 1 = 1.The special case p 0 = 1 and p 1 = 0 corresponds to a clinical scenario where mexiletine is considered the standard treatment.Generally, the program probability parameters p 0 , p 1 are somewhere between 0 and 1 when no clear best treatment exists.A program equipoise may be defined as when the treating physicians will give either of the drugs with equal likelihood, that is, p 0 = p 1 = 0.5.
These program parameters apparently affect the quality of treatment under standard of care, and hence the advantage of N-of-1 trials over standard of care.At the end of the evaluation program, these parameters can be estimated using the control data.

Design Parameters
While the study duration (or the number of treatment periods T ) is determined based on feasibility and how long a patient can be followed in the evaluation program, an N-of-1 trial under the evaluation framework is defined by the length m of the experimentation phase, and hence the length T − m of the validation phase.Intuitively, the quality of the treatment decision by an N-of-1 trial improves with a larger m as more data will be available during the feedback period.On the other hand, a long experimentation phase may place excessive burden on patients without benefitting them and imply a short validation phase for a given T .Rather than maximize accuracy, the experimentation length m will respond to the question "How much experimentation is needed for an N-of-1 trial to be beneficial to an individual?" A second design parameter is the specification of an analytical plan used to guide treatment selection during the feedback period.Principled statistical or data science methods should be employed to ensure the analysis is rigorous, while a prespecified plan entails preprogrammed algorithms that in turn facilitate quick feedback to the stakeholders.
Finally, as in conventional randomized controlled trials, the number of patients randomized in an evaluation program will need to be determined to ensure adequate statistical power for the primary evaluation question on whether N-of-1 trials improve outcomes.
To summarize, the design parameters that need to be prespecified at the planning stage of an evaluation program are the primary analysis plan used in the feedback period, the experimentation length (m) for each individual, and the number of individuals required.These will be discussed in next two sections.

An Analytical Model for N-of-1 Trials
Let y it be the outcome of patient i in treatment period t and x it ∈ − 1, 1 be the corresponding treatment for i = 1, …, n and t = 1, …, T .Without loss of generality, we assume a large value of the outcome y it is desirable.To put the notation in the context of our study, we let y it denote the negative value of MCS at the end of each two-week treatment period.
For the treatments, baclofen is coded as x it = 1 and mexiletine as x it = − 1.In this article, we focus on balanced sequences between baclofen and mexiletine in the experimentation phase, that is, assuming Consider the outcome model where β i is the patient-specific treatment effect and the noise ϵ it s are mean zero normal with cov ϵ it , ϵ is = ρ st σ 2 and ρ tt = 1.To reflect heterogeneous symptoms and HTE among the patients, we postulate The mean μ B indicates the average treatment effect and the variance σ B 2 indicates the extent of HTE in the disease population.
While μ B = 0 represents the null scenarios where there is no average treatment effects, a large value of σ B 2 indicates the needs for personalizing treatments.
Under model (3.2), the optimal treatment for patient i can be expressed as where I ⋅ is an indicator function.During the feedback period, we may present to patient i an estimated treatment effect β i based on the experimentation phase data x it , y it : t = 1, …, m along with the estimated optimal personalized treatment for the patient: Subsequently, in the event of perfect adherence to analysis result, the patient will receive the estimated optimal treatment (3.3) in the validation phase, that is, Some practical notes on the choice of β i are in order.For the purposes of providing quick feedback, a broad range of estimators can be considered.The theoretical results derived in the following sections will hold for any estimators that are approximately normally distributed with mean β i and some finite variance τ i 2 .A simple example is the the patientspecific least squares estimator β i LS = ∑ t = 1 m x it y it /m for patient i.The least squares estimator is unbiased for the patient-specific treatment effect β i regardless of the variance-covariance structure of ϵ with variance Note that the conditional variance (3.4) is free of the patient-specific parameters α i and β i .For the purposes of planning an N-of-1 trial, we will focus on the use of least squares.However, in the actual analysis, if additional information is available to inform the appropriate correlation structure of the data, likelihood-based estimation or weighted least squares accounting for such structure may improve efficiency.

Patient-Centric Criteria and Length of Experimentation Phase
In this subsection, we discuss the choice of the experimentation length m of an N-of-1 trial with respect to two different patient-centric criteria, both of which aim to maximize the benefits to patients on N-of-1 trials.
The first criterion is defined as the expected number of periods where a patient receives the optimal treatment.Mathematically, this criterion is denoted as E z i , where z i is the number of periods in which patient i receives the optimal treatment over the T treatment periods.
Proposition 1.-Suppose β i ∼ N β i , τ i 2 under a balanced experimentation phase (3.1).Then where W , U are independent standard normal variables.Furthermore, if μ B = 0, then where G is the cumulative distribution function of W / | U|, which is a pivotal distribution.
The second patient-centric criterion is defined as the expected average outcome of a patient during an N-of-1 trial.This criterion is denoted as E y ¯i , where y ¯i = ∑ t = 1 T y it /T is the average outcome of the patient in all T treatment periods.
Proposition 2.-Under the same condition as in Proposition 1, for 0 < m ≤ T , where Φ and ϕ respectively denote the standard normal distribution function and density.
We can derive a few practical principles from Proposition 1 and Proposition 2. First, conducting an N-of-1 trial with an experimentation length m < T is generally beneficial for the patient compared to experimentation in all T period.Specifically, we can derive from Proposition 1 that the patient will receive at least half of the time, that is, E z i ≥ T /2 for all m, and attain the minimum when m = T .Analogously from Proposition 2, the expected average outcome will be no smaller than the population average, that is, E y ¯i ≥ μ A for all m, and equality holds when m = T .
Second, we can derive from the propositions that E z i and E y ¯i are increasing in σ B under the null μ B = 0.In other words, an N-of-1 trial becomes more beneficial to the patient when there is a larger variability in the treatment effects across patients.
Third, and importantly, considering the null case where μ B = 0 and when the least squares LS is used to make inference based on the experimentation phase data is instructive.Under these conditions, we can derive from Proposition 2 that the criterion E y ¯i is maximized at where ξ i = σ B 2 λ i σ 2 and λ i is defined in (3.4).While Equation (4.2) gives the optimal length m* as a function σ B , σ, and λ i , it provides some general guidance: Main Result 1.-The optimal experimentation length m* is less than one-third of the total N-of-1 trial duration from a patient's perspective, that is, m* ≲ T /3.

Sample Size
In this subsection, we discuss how much experimentation is adequate in terms of the sample size enrolled to the evaluation program.We first define the quality of an N-of-1 trial as the expected health outcome under the estimated optimal treatment x i * (3.3).Assuming perfect adherence to the analysis results in the feedback period, the quality of an N-of-1 trial can be defined as E y i * where y i where Δ measures the degree of quality improvement due to N-of-1 trials defined over the patient population.The hypotheses (4.3) can in turn be tested using the regular Z-statistic: where c α is the upper αth percentile of standard normal.In Appendix C, we derive the expressions for Δ, var y i * , and var y i ′ in (4.5) under the condition that τ i 2 ≡ τ 2 for all i.This condition is met when the N-of-1 trial patients receive the same sequence x it in the experimentation phase or when ϵ it has a specific variance-covariance structure.For example, under a compound symmetry, that is, ρ st ≡ ρ, we can show λ i ≡ λ = 1 − ρ, that is, having τ i 2 ≡ 1 − ρ σ 2 /m for all i.Specifically, under the assumption that the SOC treatment x i ′ for a given patient is independent of the patient-specific treatment effect β i , we have (4.6) and (4.8) The above expressions account for population-level information about the treatments through the program parameter p 1 .For example, if emerging evidence in the literature suggests slight advantage of x i = 1 over x i = − 1, we may assume the physicians in the program will select x i ′ = 1 with p 1 > 0.5.In Appendix C, we provide expressions analogous to (4.6)-(4.8)for the situations where the physician may prescribe treatment x i ′ with patientspecific knowledge in addition to the population-level parameter p 1 .However, we note that using expressions (4.6)-(4.8)may adequately reflect the standard of care where treatments are chosen based on population-level information rather than patient-specific knowledge.Furthermore, under the independence assumption of x i ′ and β i , the power expression depends only on model parameters σ A , σ B , μ B , σ, τ i for which information may be available to provide preliminary estimates and the known design parameters p 1 , m, n, T .Finally, under the null case μ B = 0: Main Result 2.-All else being equal, the power to demonstrate quality improvement due to N-of-1 trials (vs SOC) increases as heterogeneity of treatment effects σ B 2 increases.

Optimal Length of Experimentation
We use the MCS natural history data in (Mitsumoto et al. 2019) to inform the design of the evaluation program for people with ALS.Specifically, we fitted a random effects model to the data and obtained an estimate of σ A = 4.8 and σ = 1.6.For simplicity in illustration, we further assume that the within-subject noise is conditionally independent given the population-level parameters.Figure 2 plots the patient-centric criteria against different values of μ B , σ B , and m for T = 18.While the two criteria adopt different metrics, they are maximized when m is relatively small.In Figure 2 and in all μ B , σ B that we have considered (not shown here), the optimal values of m range from 2 to 6 for both criteria.This is consistent with what Main Result 1 implies: m* ≲ T /3 = 6.

Sample Size and Effect Size
Main Result 2 implies that σ B 2 may be viewed as an effect size in power calculation, while the power also depends on other model parameters and design parameters.As in conventional practice, the choice of an effect size should be based on a clinically meaningful difference, whereas the other model parameters (e.g., σ A , σ, etc) may be based on pilot data if available.Figure 3 plots the power against n, m for three different effect sizes σ B for a one-sided test at 5% significance.Under each effect size, we identify the smallest n that achieves 80% power for any m and obtain that the required n, m are 210, 12 , 60, 6 , and 34, 4 respectively for σ B = 1.6, 3.2, and 4.8.We note that under a small effect size σ B = 1.6, the required m = 12 is greater than T /3.In light of Main Result 1, we may instead adopt n, m = 210, 6 in order to maximize the benefits of the N-of-1 trials to the patients.The power of this modified design is 78%, which is slightly lower than the target 80%.Generally, we observe from Figure 3 that the impact of m on power is relatively small compared to that of n and σ B except when the effect size is small (σ B = 1.6).
To determine if a specific value of σ B corresponds to a clinically meaningful effect size, relating σ B to Δ using (4.6) may be useful, as Δ lives on the same scale as the measurement outcomes.In our application, a 3-to 4-point change on the MCS will represent a clinical meaningful shift.Based on the pilot data and assumptions, the effect size σ B = 1.6, 3.2, 4.8 translate to a degree of quality improvement Δ = 1.2, 2.5, 3.8 respectively.Thus, we set the sample size for this evaluation program at n = 34 with four treatment periods (two on mexiletine and two on baclofen) based on the results for σ B = 4.8.Generally, the minimally clinically important heterogeneity (MCIH σ B, min ) may be determined relating to the minimally clinically important change (MCID, Δ min ) using (4.6).

Power for Comparing to Fully Informed SOC
The calculations in the previous subsection assume the null case μ B = 0 under which the power (4.5) does not depend on the parameter p 1 .Under a non-null case, the value of p 1 reflects how informed the practice is about the population-level treatment effect.For example, if evidence in the literature suggests μ B > 0, an informed practice will prescribe x i ′ = 1 with p 1 > 0.5.Specifically, standard of care that is fully informed by the literature may Table 1 shows that as the average treatment effect μ B grows larger and a fully informed SOC practice prescribes x i ′ = 1 more often (i.e., larger p 1 ), the power to demonstrate quality improvement in (4.3) becomes smaller.On the one hand, this suggests that if there is overwhelming evidence favoring x i ′ = 1 over x i ′ = − 1 in the literature, conducting N-of-1 trials will have diminished effect provided that the standard of care is fully informed.
On the other hand, even with a large average treatment effect μ B = 2.4 = 0.5σ B , quality improvement due to N-of-1 trials Δ > 3, which is still clinically meaningful and the power is still reasonable high (68%) in this sensitivity analysis.This suggests that evaluating N-of-1 trials is a worthwhile endeavor unless there is overwhelming evidence of a large average treatment effect.
The numerical results in this article, and power and the patient-centric criteria in general, can be computed using tools are available at: https://roadmap2health.io/hdsr/n1power/.

Discussion
N-of-1 trials have been increasingly used as a design tool to bridge practice and science in rare diseases (Müller et al. 2021)(n.d.b).However, the literature is missing concrete guidelines on N-of-1 designs as to how much experimentation is appropriate.A fundamental issue is the articulation of a framework that will facilitate the evaluation of the usefulness of N-of-1 trials.In this article, we introduce an evaluation framework and outline the basic elements in an evaluation program for N-of-1 trials-namely, an experimentation phase, a feedback period, and a validation phase.In the literature, the reporting of N-of-1 trials mostly focuses only on the results of the experimentation phase, where patients explore the different treatments sequentially under a rigorous clinical protocol such as randomization, blinding, and scheduled follow-up.The feedback period and the validation phase are the critical elements in the planning and the conduct of N-of-1 trials but are, unfortunately, often omitted in the description of the design and the analytical plan.
Specifically, the length of the validation phase, relative to that of the experimentation phase, should be given careful consideration.We have demonstrated theoretically and numerically that the optimal length of experimentation from the patient's perspective should be no greater than one-third of the entire study duration.This implies a relatively long validation phase, suggesting the importance of reproducing the quality of the decisions due to N-of-1 trials with additional follow-up.Our theoretical results also provide guidance on how many patients are needed in order to adequately power for testing quality improvement.Importantly, the relative length of experimentation and validation has minimal impact on the power.In other words, little conflict exists between the goal of maximizing patient benefits and maximizing power.
The feedback period facilitates evidence-based treatment decisions using data measured in the experimentation phase.Summarizing the relative benefits of the treatments via a single numerical statistic is a pragmatic way to present such evidence, because the information can be objectively presented and quickly digested by stakeholders.We have developed design calculus based on the model-based least squares estimation, which is quick to compute and produces unbiased estimates of patient-specific treatment effects under a broad range of scenarios.Other more sophisticated model-based methods may be used to deal with the more complex situations.For example, when we observe high volume of outcome measures via wearable devices, we could extend model (3.2) to an autoregressive model with multiple observations per treatment period (Kronish et al. 2019).In practice, treatment decisions are likely determined based on the totality of evidence.For example, in situations where a treatment that apparently benefits a patient may have side effects, a possibly less effective treatment may be preferred if it is more easily tolerated.Considerations of multiple outcomes in the analysis during the feedback period will likely increase adherence and will warrant further empirical, domain-specific research.Overall, as the feedback period potentially changes the treatment decisions-and hence, the outcomes-in the validation phase, it can be viewed as an integral part of the intervention component.We may thus experiment in a randomized fashion different elements in the feedback period for different individuals: we may consider presenting different endpoints (e.g., muscle cramp or safety), using a single endpoint, a composite outcome, or multivariate endpoints, using different types of analyses (e.g., intent-to-treat vs per-protocol), and asking patients for their satisfaction and preference (Cheung et al. 2020).
Some considerations, assumptions, and limitations for power calculation in conventional randomized controlled trials also apply for N-of-1 trials.First, power calculation involves the inputs of a number of nuisance model parameters (e.g., σ A , σ) as well as the effect size (σ B 2 ).While the effect size σ B 2 should be determined based on clinical relevance shift, the other parameters ideally can be based on estimates from pilot data.However, in situations where robust pilot data are not available, a potential useful strategy is to leverage the concepts of adaptive designs (U.S. Food & Drug Administration 2019) whereby the model parameters are updated using interim data in the evaluation program and the updates in turn inform a reassessment of the degree of quality improvement and the sample size required.
Second, our derivations assume that patients in both arms comply with their treatments in the following sense: patients in the N-of-1 trials adhere to the estimated optimal treatments based on the experimentation phase data, and patients in the SOC continue with the same treatment as in the experimentation phase.If there is prior information about Combining (A.2) and (A.3) gives which is free of α i .Since E z i = E E z i β i , the expectations of both sides in (A.4) are to be taken with respect to the distribution of β i ∼ N μ B , σ B 2 to complete the proof.By change of variable, we have The proof is completed by substituting (A.5) into (A.4).

B.1. Lemma 1
Derivations of E y ¯i will be facilitated by first noting the following lemma: Lemma 1.
where Φ and ϕ respectively denote the standard normal distribution function and density function.

Proof of Lemma 1:
Using definition of expectation, we derive where U, W are independent standard normal variables.Thus, U − σ V W ∼ N 0, 1 + σ V 2 , and the first term in (B.1) can be evaluated as Next, the single integral in the second term in (B.2) can be evaluated using integration by parts

B.2. Proof of Proposition 2
Recall that y ¯i denotes the average outcome of patient i in all T treatment periods in an N-of-1 trial.Hence, (B.4)

Equation (B.4) holds as because of balanced design
we have (B.5) Expression (B.5) is obtained by applying Lemma 1 with V = β i /τ i .Putting (B.5) into (B.4) gives thus completing the proof of Proposition 2.

B.3. Derivation of optimal experimentation length m* and Main Result 1
For least squares β i LS , the variance τ i 2 = λ i σ 2 /m, where λ i = 1 + ∑ s ≠ t x is x it ρ st /m.Further supposing μ B = 0 simplifies (B.6) to Hence, maximizing E y ¯i as a function of m is equivalent to maximizing the function where ξ i = σ B 2 / λ i σ 2 is free of m.Using standard calculus arguments, we can show that the maximizer m* of ℎ m solves the equation 2ξ i m *2 + 3m* − T = 0 or equivalently, The derivation of m* is completed by multiplying 9 + 8ξ i + 3 in the numerator and the denominator of (B.7), which gives m* = 2T 9 + 8ξ i T + 3 .
Now, since ξ i ≥ 0, we have m* ≤ 2T / 9 + 3 = T /3.As a practical note, due to discreteness in m, the optimal m may be a result of rounding up m*.Hence a slightly less sharp inequality would be m* ≲ T /3 < T /3 + 1.

Appendix C. Theoretical Results Concerning Power
In this section, we derive the expressions involved in the power of the Z-test-namely, Δ, var y i * , and var y i ′ .
Recall that p 0 and p 1 respectively denote the probabilities that the treating physicians will prescribe mexiletine x it = − 1 and baclofen x it = 1 under the treatment program.Based on model (3.2), we can express the quality of an N-of-1 trial as: and analogously E y i ′ = μ A + E β i x i ′ where x i ′ is the treatment given to patient i in SOC.
Under the independence assumption of x i ′ and β i , we further obtain E y i ′ = μ A + 2p 1 − 1 μ B , and where E β i x i * is given in (B.5). Next, The last equality is a result of (C.1).Similarly, we can show Finally, under the null μ B = 0, we have Main Result 2 is proved by dividing σ B 2 on the numerator and the denominator in the above expression, as a result of which the numerator will be a constant and the denominator will be a decreasing function σ B 2 .
For the situations where the physicians have patient-specific knowledge to inform treatments under the SOC, we may postulate that (C.4)Using (B.5), (C.3), and (C.4), after some algebra, we have It is instructive to consider the null case μ B = 0, under which    Table 1.
Quality improvement Δ and power for comparing to a fully informed SOC (standard of care) with p 1 = Φ μ B /σ B with n = 34,, m = 4, T = 18, σ A = 4.8, σ B = 4.8, σ = 1.6 and ρ = 0. μ B p 1 = Φ μ B /σ B Δ * and v* are respectively the sample mean and the sample variance of y i * in the n patients randomized to an N-of-1 trial and y ¯′ and v′ are the sample mean and sample variance of y i ′ in the n patients randomized to SOC.Using standard arguments gives the power of the Z can be derived by straightforward derivation.The proof of Lemma 1 is thus completed by plugging (B.2) and (B.3) into (B.1).
parameter θ C indicates how perfect the knowledge the physicians have about the specific best treatments for their patients, with θ C = 1 indicating perfect knowledge and θ C = 0 indicating no knowledge beyond the population-level information p 1 .Under the SOC treatment system (C.2),we haveE β i x i ′ = E β i E x i ′ β i = 2θ C E β i I β i > 0 − θ C μ B + 1 − θ C μ B 2p 1 − 1 (C.3)where E β i I β i > 0 = σ B ϕ μ B /σ B + μ B Φ μ B /σ B .

Figure 1 .
Figure 1.Schema of an evaluation program for personalized (N-of-1) trials comparing treatment A and treatment B. Under the evaluation program, patients are randomized to either an N-of-1 trial or the standard of care.

Figure 2 .
Figure 2. Patient-centric criteria vs experimentation length m under different values of μ B and σ B .Left: Expected number of optimal treatment periods vs m.Right: Expected average outcome (negative of MCS) of patient vs m.