The Family of Single-Case Experimental Designs

Single-case experimental designs (SCEDs) represent a family of research designs that use experimental methods to study the effects of treatments on outcomes. The fundamental unit of analysis is the single case— which can be an individual, clinic, or community—ideally with replications of effects within and/or between cases. These designs are flexible and cost-effective and can be used for treatment development, translational research, personalized interventions, and the study of rare diseases and disorders. This article provides a broad overview of the family of single-case experimental designs with corresponding examples, including reversal designs, multiple baseline designs, combined multiple baseline/reversal designs, and integration of single-case designs to identify optimal treatments for individuals into larger randomized controlled trials (RCTs). Personalized N-of-1 trials can be considered a subcategory of SCEDs that overlaps with reversal designs. Relevant issues for each type of design—including comparisons of treatments, design issues such as randomization and blinding, standards for designs, and statistical approaches to complement visual inspection of single-case experimental designs—are also discussed.

SCEDs provide a framework to determine whether changes in a target behavior(s) or symptom are in fact a function of the intervention.The fundamentals of an SCED involve repeated measurement, replication of conditions (e.g., baseline and intervention conditions), and the analysis of effects with respect to each individual serving as his or her own control.This process can be useful for identifying the optimal treatment for an individual (Dallery & Raiff, 2014;Davidson et al., 2021), treating rare diseases (Abrahamyan et al., 2016), and implementing early phase translational research (Czajkowski et al., 2015).SCEDs can be referred to as 'personalized (N-of-1) trials' when used this way, but they also have broad applicability to a range of scientific questions.Results from SCEDs can be aggregated using meta-analytic techniques to establish generalizable methods and treatment guidelines (Shadish, 2014;Vannest et al., 2018).Figure 1 presents the main family of SCEDs, and shows how personalized (N-of-1) trials fit into these designs (Vohra et al., 2016).
The figure also distinguishes between experimental and nonexperimental single-case designs.In the current article, we provide an overview of SCEDs and thus a context for the articles in this special issue focused on personalized (N-of-1) trials.Our focus is to provide the fundamentals of these designs, and more detailed treatments of data analysis (Moeyaert & Fingerhut, 2022;Schork, 2022) conduct and reporting standards (Kravitz & Duan, 2022;Porcino & Vohra, 2022), and other methodological considerations are provided in this special issue.Our hope is that this article will inspire a diverse array of students, engineers, scientists, and practitioners to further explore the utility, rigor, and flexibility of these designs.
The most common approach to evaluating the effectiveness of interventions on outcomes is using randomized controlled trials (RCTs).RCTs provide an idea of the average effect of an intervention on outcomes.People do not all change at the same rate or in the same way, however; variability in both how people change and the effect of the intervention is inevitable (Fisher et al., 2018;Normand, 2016;Roustit et al., 2018).These sources of variability are conflated in a typical RCT, leading to heterogeneity of treatment effects (HTE).Research on HTE has shown variability in outcomes in RCTs, and in some studies very few people actually exhibit the benefits of that treatment (Williams, 2010).One approach in RCTs is to assess moderators of treatment response to identify individual differences that may predict response to a treatment.This approach may not limit variability in response, and substantial reduction in variability of treatment for subgroups in comparison to the group as a whole is far from assured.Even if variability is reduced, the average effect for that subgroup may not be representative of individual members of the subgroup.
SCEDs can identify the optimal treatment for an individual person rather than the average person in a group (Dallery & Raiff, 2014;Davidson et al., 2021;Hekler et al., 2020).SCEDs are multiphase experimental designs in which a great deal of data is collected on a single person, said person serves as his or her own control (Kazdin, 2011(Kazdin, , 2021)), and the order of presentation of conditions can be randomized to enhance experimental control.That is, a person's outcomes in one phase are compared to outcomes in another phase.In a typical study, replications are achieved within and/or across several individuals; this allows for strong inferences about causation between behavior and the treatment (or levels thereof).Achieving replications is synonymous with achieving experimental control.
We provide an overview of three experimental designs that can be adapted for personalized medicine: reversal, multiple baseline, and combined reversal and multiple baseline designs, and we discuss how SCEDs can be integrated into RCTs.These designs focus on demonstrating experimental control of the relationship between treatment and outcome.Several general principles common to all of the designs are noteworthy (Lobo et al., 2017).First, in many studies, treatment effects are compared with control conditions with a no-intervention baseline as the initial condition.To reduce threats to internal validity of the study, the order of assignment of interventions can be randomized (Kratochwill & Levin, 2010) and, when possible, the intervention and data collection can be blinded.The demonstration of experimental control across conditions or people needs to be replicated several times (three replications is the minimum) to ensure confidence of the relationship between treatment and outcome (Kratochwill et al., 2010;Kratochwill & Levin, 2015).Demonstrating stability of data within a phase or, otherwise, no trend in the direction of treatment effects prior to starting treatment is particularly important.Stability refers to the degree of variability in the data path over time (e.g., data points must fall within a 15% range of the median for a condition).Thus, phase length needs to be flexible for the sake of determining stability and trend within a phase, but a minimum of 5 data points per phase has been recommended (Kratochwill et al., 2013).The focus of the intervention's effects is on clinically rather than statistically significant effects with the target effect prespecified and considered in interpretation of the relevance of the effect for clinical practice (Epstein et al., 2021).In addition, multiple dependent outcomes can be simultaneously measured (Epstein et al., 2021).SCEDs can be used to test whether a variable mediates the effect of a treatment on symptoms or behavior (Miočević et al., 2020;Riley & Gaynor, 2014).Visual inspection of graphical data is typically used to determine treatment effects, and statistical methods are commonly used to assist in interpretation of graphical data (Epstein et al., 2021).Furthermore, a growing number of statistical approaches can summarize treatment effects and provide effect sizes (Kazdin, 2021;Moeyaert & Fingerhut, this issue;Pustejovsky, 2019;Shadish et al., 2014).Data across many SCED trials can be aggregated to assess the generality of the treatment effects to help address for whom and under what conditions an intervention is effective (Branch & Pennypacker, 2013;Shadish, 2014;Van den Noortgate & Onghena, 2003).

Reversal Designs
A reversal design collects behavioral or biological outcome data in at least two phases: a baseline or notreatment phase (labeled as 'A') and the experimental or treatment phase (labeled as 'B').The design is called a reversal design because there must be reversals or replications of phases for each individual; for example, in an ABA design, the baseline phase is replicated (Kazdin, 2011).Ideally, three replications of treatment effects are used to demonstrate experimental control (Kratochwill et al., 2010;Kratochwill & Levin, 1992).Figure 2 shows hypothetical results from an A1B1A2B2 design.The graph shows three replications of treatment effects (A1 versus B1, B1 versus A2, A2 versus B2) across four participants.Each phase was carried out until stability was evident from visual inspection of the data as well as absence of trends in the direction of the desired effect.
The replication across participants increases the confidence in the effectiveness of the intervention.Extension of this design is possible by comparing multiple interventions, as well.The order of the treatments should be randomized, especially when the goal is to combine SCEDs across participants.
Reversal designs can be more dynamic and compare several treatments.A common approach in personalized medicine would be to compare two or more doses of or different components of the same treatment (Ward- Horner & Sturmey, 2010).For example, two drug doses could be compared using an A1B1C1B2C2 design, where A represents placebo and B and C represent the different drug doses (Guyatt et al., 1990).In the case of drug studies, the drug/placebo administration can be double blinded.A more complex design could be A1B1A2C1A3C2A4B2, which would yield multiple replications of the comparison between drug and placebo.
Based on the kinetics of the drug and the need for a washout period, the design could also be A1B1C1B2C2.This would provide three demonstration of treatment effects: B1 to C1, C1 to B2, and B2 to C2.Other permutations could be planned strategically to identify the optimal dose for each individual.
Advantages of SCED reversal designs are their ability to experimentally show that a particular treatment was functionally related to a particular change in an outcome variable for that person.This is the core principle of personalized medicine: an optimal treatment for an individual can be identified (Dallery & Raiff, 2014;Davidson et al., 2021;Guyatt et al., 1990;Hekler et al., 2020;Lillie et al., 2011).These designs can work well for studying the effect of interventions on rare diseases in which collecting enough participants with similar characteristics for an RCT would be unlikely.An additional strength is the opportunity for the clinical researcher who also delivers clinical care to translate basic science findings or new findings from RCTs to their patients, who can potentially benefit (Dallery & Raiff, 2014;Hayes, 1981).Research suggests that the trickledown of new developments and hypotheses to their support in RCTs can take more than 15 years; many important advancements in the medical and behavior sciences are likely not to be implemented rapidly enough (Riley et al., 2013).The ability to test new intervention developments using scientific principles could speed up their translation into practice.
Limitations to SCED designs, however, are worth noting.Firstly, in line with the expectation that the outcome returns to baseline levels, reversals may require removal of the treatment.If the effect is not quickly reversible, then the designs are not relevant.A washout period may be placed in-between phases if the effect is not immediately reversible; for example, a drug washout period could be planned based on the half-life of drug.
Secondly, the intervention should have a relatively immediate effect on the outcome.If many weeks to months are needed for some interventions to show effects, a reversal design may not be optimal unless the investigator is willing to plan a lengthy study.Thirdly, the design depends on comparing stable data over conditions.If achieving stability due to uncontrolled sources of biological or environmental variation is not possible, a reversal design may not be appropriate to evaluate a treatment, though it may be useful to identify the sources of variability (Sidman, 1960).Finally, for a reversal to a baseline, a no-treatment phase may be inappropriate in investigating treatment effects for a very ill patient.

Multiple Baseline Designs
An alternative to a reversal design is the multiple baseline design, which does not require reversal of conditions to establish experimental control.There are three types of multiple baseline designs: multiple baseline across people, behaviors, and settings.The most popular is the multiple baseline across people, in which baselines are established for three or more people for the same outcome (Cushing et al., 2011;Meredith et al., 2011).
Treatment is implemented after different durations of baseline across individuals.The order of treatment implementation across people can be randomized (Wen et al., 2019).Figure 3 shows an example across three individuals.In this hypothetical example, baseline data for each person are relatively stable and not decreasing, and reductions in the dependent variable are only observed after introduction of the intervention.Inclusion of one control person, who remains in baseline throughout the study and provides a control for extended monitoring, is also possible.Another variation is to collect baseline data intermittently in a 'probe' design, which can minimize burden associated with simultaneous and repeated measurement of outcomes (Byiers et al., 2012;Horner & Baer, 1978).If the outcomes do not change during baseline conditions and the changes only occur across participants after the treatment has been implemented-and this sequence is replicated across several people-change in the outcome may be safely attributed to the treatment.The length of the baselines still must be long enough to show stability and no trend toward improvement until the treatment is implemented.
The two other multiple baseline designs focus on individual people: the multiple baseline across settings and the multiple baseline across behaviors (Boles et al., 2008;Lane-Brown & Tate, 2010).An example of a multiple baseline across settings would be a dietary intervention implemented across meals.An intervention that targets a reduction in consumption of high-glycemic index foods, or foods with added sugar across meals, could be developed with the order of meals randomized.For example, someone may be randomized to reduce sugar-added or high-glycemic index foods for breakfast without any implementation at lunch or dinner.
Implementation of the diet at lunch and then dinner would occur after different durations of baselines in these settings.An example of multiple baseline across behaviors might be to use feedback to develop a comprehensive exercise program that involves stretching, aerobic exercise, and resistance training.Feedback could target improvement in one of these randomly selected behaviors, implemented in a staggered manner.
The main limitation to a multiple baseline design is that some people (or behaviors) may be kept in baseline or control conditions for extended periods before treatment is implemented.Of course, failure to receive an effective treatment is common in RCTs for people who are randomized to control conditions, but unlike control groups in RCTs, all participants eventually receive treatment.
Finally, while the emphasis in personalized medicine is the identification of an optimal treatment plan for an individual person, situations in which multiple baselines across people prove relevant for precision medicine may arise.For example, identification of a small group of people with common characteristics-perhaps with a rare disease and for which a multiple-baseline-across-people design could be used to test an intervention more effectively than a series of personalized designs-is possible.In a similar vein, differential response to a common treatment in a multiple-baseline-across-people design can help to identify individual differences that can compromise the response to a treatment.

Integrating Multiple Baseline and Reversal Designs
While reversal designs can be used to compare effects of interventions, multiple baseline designs provide experimental control for testing one intervention but do not compare different interventions.One way to take advantage of the strengths of both designs is to combine them.For example, the effects of a first treatment could be studied using a multiple-baseline format and, after experimental control has been established, return to baseline prior to the commencement of a different treatment, which may be introduced in a different order.
These comparisons can be made for several different interventions with the combination of both designs to demonstrate experimental control and compare effects of the interventions.
Figure 4 shows a hypothetical example of a combined approach to identify the best drug to decrease blood pressure.Baseline blood pressures are established for three people under placebo conditions before new drug X is introduced across participants in a staggered fashion to establish relative changes in blood pressure.All

Other Varieties of Single-Case Experimental Designs
Other less commonly used designs within the family of SCEDs may be useful for personalized medicine.One of the most relevant may be the alternating treatment design (Barlow & Hayes, 1979;Manolov et al., 2021), in which people are exposed to baseline and one or more treatments for very brief periods without the concern about stability before changing conditions.While the treatment period may be short, many more replications of treatments-and ineffective treatments-can be identified quickly.This type of design may be relevant for drugs that have rapid effects with a short half-life and behavioral interventions that have rapid effects (Coyle & Robertson, 1998)-for example, the effects of biofeedback on heart rate (Weems, 1998).Another design is the changing criterion design, in which experimental control is demonstrated when the outcome meets certain preselected criteria that can be systematically increased or decreased over time (Hartmann & Hall, 1976).The design is especially useful when learning a new skill or when outcomes change slowly over time (Singh & Leung, 1988)-for example, gradually increasing the range of foods chosen in a previously highly selective eater (Russo et al., 2019).

Integrating Single-Case Experimental Designs Into Randomized Controlled Trials
SCEDs can be integrated into RCTs to compare the efficacy of treatments chosen for someone based on SCEDs versus a standardized or usual care treatment (Epstein et al., 2021;Schork & Goetz, 2017).Such innovative designs may capture the best in SCEDs and randomized controlled designs.Kravitz et al. (2018) used an RCT in which one group (n = 108) experienced a series of reversal AB conditions, or a personalized (N-of-1) trial.The specific conditions were chosen for each patient from among eight categories of treatments to reduce chronic musculoskeletal pain (e.g., acetaminophen, any nonsteroidal anti-inflammatory drug, acetaminophen/oxycodone, tramadol).The other group (n = 107) received usual care.The study also incorporated mobile technology to record pain-related data daily (see Dallery et al., 2013, for a discussion of technology and SCEDs).The results suggested that the N-of-1 approach was feasible and acceptable, but it did not yield statistically significant superior results in pain measures compared to the usual care group.However, as noted by Vohra and Punja (2019), the results do not indicate a flaw in the methodological approach: finding that two treatments do not differ in superiority is a finding worth knowing.
Another example of a situation where an integrated approach may be useful is selecting a diet for weight control.Many diets for weight control that vary in their macronutrient intake-such as low carb, higher fat versus low fat, and higher carb-have their proponents and favorable biological mechanisms.However, direct comparisons of these diets basically show that they achieve similar weight control with large variability in outcome.Thus, while the average person on a low-fat diet does about the same as the average person on a lowcarb diet, some people on the low-carb diet do very well, while some fail.Some of the people who fail on the low-fat diet would undoubtedly do well on the low-carb diet, and some who fail on the low-fat diet would do well on the low-carb diet.Further, some would fail on both diets due to general problems in adherence.
Personalized medicine suggests that diets should be individualized to achieve the best results.SCEDs would be one way to show 'proof of concept' that a particular diet is better than a standard healthy diet.First, people would be randomized to experimental (including SCEDs) or control (not basing diet on SCEDs).Subject selection criteria would proceed as in any RCT.For the first 3 months, people in the experimental group would engage in individual reversal designs in which 2-week intervals of low-carb and low-fat diets would be interspersed with their usual eating, and weight loss, diet adherence, food preferences, and the reinforcing value of foods in the diet would be measured to assess biological, behavioral, and subjective changes.
Participants in the control group would experience a similar exposure to the different types of diets, but the diet to which they are assigned would be randomly chosen rather than chosen using SCED methods.In this way, they would have similar exposure to diets during the first 3 months of the study, but this experience would not impact group assignment.As with any RCT, the study would proceed with regular measures (e.g., 6, 12, 24 months) and the hypothesis that those assigned to a diet that results in better initial weight loss, and that they like and are motivated to continue, would do better than those receiving a randomly selected diet.The study could also be designed with three groups: a single-case design experimental group similar to the approach in the hypothetical study above and two control groups, one low-fat and one low-carb.
An alternative design would be to have everyone experience SCEDs for the first 3 months and then be randomized to either the optimal treatment identified during the first 3 months or an intervention randomly chosen among the interventions to be studied.This design has the advantage of randomization being after 3 months of study so that dropouts and nonadherers within the first 3 months would not be randomized in an intent-to-treat format.
The goal of either hypothesized study, or any study that attempts to incorporate SCEDs into RCTs, is that matching participants to treatments will provide superior results in comparison to providing the same treatment to everyone in a group.Two hypotheses can be generated in these types of designs: first, that the mean changes will differ between groups, and second, that the variability will differ between groups with less variability in outcome for people who have treatment selected after a single-case trial than people who have a treatment randomly selected.A reduction in variability plus mean differences in outcome should increase the effect size for people treated using individualized designs, increase power, and allow for a smaller sample size to ensure confidence about the differences observed between groups.

Limitations of Single-Case Experimental Designs
Single-case experimental designs have their common limitations.If a measure changes with repeated testing without intervention, it may not be useful for an SCED unless steps can be taken to mitigate such reactivity, such as more unobtrusive monitoring (Kazdin, 2021).Given that the effects of interventions are evaluated over time, systematic environmental changes or maturation could influence the relationship between a treatment and outcome and thereby obscure the effect of a treatment.However, the design logic of reversal and multiple baseline designs largely control for such influences.Since SCEDs rely on repeated measures and a detailed study of the relationship between treatment and outcome, studies that use dependent measures that cannot be sampled frequently are not candidates for SCEDs.Likewise, the failure to identify a temporal relationship between the introduction of treatment and initiation of change in the outcome can make attribution of changes to the intervention challenging.A confounding variable's association with introduction or removal of the intervention, which may cause inappropriate decisions about the effects of the intervention, is always possible.
Dropout or uncontrolled events that occur to individuals can introduce confounding variables to the SCED.
These problems are not unique to SCEDs and also occur with RCTs.

Single-Case Experimental Designs in Early Stage Translational Research
The emphasis of a research program may be on translating basic science findings to clinical interventions.The goal may be to collect early phase translational research as a step toward a fully powered RCT- (Epstein et al., 2021).The fact that a large amount of basic science does not get translated into clinical interventions is well known (Butler, 2008;Seyhan, 2019); this served in part as the stimulus for the National Institutes of Health (NIH) to develop a network of clinical and translational science institutes in medical schools and universities throughout the United States.A common approach to early phase translational research is to implement a small, underpowered RCT to secure a 'signal' of a treatment effect and an effect size.This is a problematic approach to pilot research, and it is not advocated by the NIH as an approach to early phase translational research (National Center for Complementary and Integrative Health, 2020).The number of participants needed for a fully powered RCT may be substantially different from the number projected from a small-sample RCT.These small, underpowered, early phase translational studies may provide too large an estimate of an effect size, leading to an underpowered RCT.Likewise, a small-sample RCT can lead to a small effect size that can, in turn, lead to a failure to implement a potentially effective intervention (Kraemer et al., 2006).Therefore, SCEDs-especially reversal and multiple baseline designs-are evidently ideally suited to early phase translational research.This use complements the utility of SCEDs for identifying the optimal treatment for an individual or small group of individuals.

Conclusion
Single-case experimental designs provide flexible, rigorous, and cost-effective approaches that can be used in personalized medicine to identify the optimal treatment for an individual patient.SCEDs represent a broad array of designs, and personalized (N-of-1) designs are a prominent example, particularly in medicine.These designs can be incorporated into RCTs, and they can be integrated using meta-analysis techniques.SCEDs should become a standard part of the toolbox for clinical researchers to improve clinical care for their patients, and they can lead to the next generation of interventions that show maximal effects for individual cases as well as for early phase translational research to clinical practice.

Figure 1 .
Figure 1.The main family of single-case experimental designs and nonexperimental designs.A = Baseline, B and C refer to different treatments.

Figure 2 .
Figure 2. Example of a reversal design showing replications within and between subjects.A1 = First Baseline, B1 First Treatment, A2 = Return to Baseline, B2 = Return to Treatment.P1-P4 represent different hypothetical participants.

Figure 3 .
Figure 3. Example of a multiple baseline design showing replications between subjects.P1-P3 represent different hypothetical participants.
return to placebo after blood pressures reach stability, drug Y is introduced in a staggered sequence, participants are returned to placebo, and the most effective intervention for each individual (drug X or Y) is reintroduced to replicate the most important result: the most effective medication.This across-subjects design establishes experimental control for two different new drug interventions across three people while also establishing experimental control for five comparisons within subjects (placebo-drug X, drug Y-placebo, placebo-drug Y, drug Y-placebo, placebo-more effective drug).Though this combined design strengthens confidence beyond either reversal or multiple baseline designs, in many situations, experimental control demonstrated using a reversal design is sufficient.

Figure 4 .
Figure 4. Example of a combined reversal and multiple baseline to determine the best drug to lower blood pressure.BL = Baseline.Drug X and Drug Y represent hypothetical drugs to lower blood pressure, and Best Drug represents a reversal to the most effective drug as identified for each hypothetical participant, labeled P1-P3.