Individualized Decision-Making Under Partial Identification: Three Perspectives, Two Optimality Results, and One Paradox

Unmeasured confounding is a threat to causal inference and gives rise to biased estimates. In this article, we consider the problem of individualized decision-making under partial identification. Firstly, we argue that when faced with unmeasured confounding, one should pursue individualized decision-making using partial identification in a comprehensive manner. We establish a formal link between individualized decision-making under partial identification and classical decision theory by considering a lower bound perspective of value/utility function. Secondly, building on this unified framework, we provide a novel minimax solution (i.e., a rule that minimizes the maximum regret for so-called opportunists) for individualized decision-making/policy assignment. Lastly, we provide an interesting paradox drawing on novel connections between two challenging domains, that is, individualized decision-making and unmeasured confounding. Although motivated by instrumental variable bounds, we emphasize that the general framework proposed in this article would in principle apply for a rich set of bounds that might be available under partial identification.

This article is © 2021 by author(s) as listed above.
The article is licensed under a Creative Commons Attribution (CC BY 4.0) International license (https://creativecommons.org/licenses/by/4.0/legalcode), except where otherwise indicated with respect to particular material included in the article. The article should be attributed to the author(s) identified above.

Media Summary
In the era of big data, observational studies are a treasure for both association analysis and causal inference, with the potential to improve decision-making. Depending on the set of assumptions one is willing to make, one might achieve either point, sign, or partial identification of causal effects. In particular, under partial identification, it might be inevitable to make suboptimal decisions. Policymakers caring about decision-making would face the following important question: What are optimal strategies corresponding to different risk preferences?
In this article, the author offers a unified framework that generalizes several decisionmaking strategies in the literature. Building on this unified framework, the author also provides a novel minimax solution (i.e., a rule that minimizes the maximum regret for so-called opportunists) for individualized decision-making and policy assignment.

arXiv:2110.10961v1 [stat.ME] 21 Oct 2021
Individualized decision-making under partial identification 2 1. The power of storytelling: different views might lead to different decisions Suppose one is playing a two-armed slot machine. The rewards R −1 and R 1 are the payoffs for hitting the jackpot of each arm, respectively. For simplicity, let us assume that both arms always give positive rewards (R −1 , R 1 > 0), that is, one is guaranteed not to lose and therefore would not refrain from playing this game. However, due to some uncertainty, one does not have prior knowledge of the exact values of R −1 and R 1 . Fortunately, suppose there is a magic instrument, which can help one to identify the range of rewards.
By only providing one with the left panel of Figure 1, that is, the range of R 1 − R −1 , most people might opt to pull arm −1. But wait a minute... where am I, and why am I looking at the left panel without knowing the real payoffs? After looking at the right panel, the decision might be changed depending on a person's risk preference. Is there such an instrument in real life? The answer is in the affirmative. One such instrument is a so-called instrumental variable (IV). In statistics and related disciplines, an IV method is used to estimate causal relationships when randomized experiments are not feasible or when there is noncompliance in a randomized experiment. Intuitively, a valid IV induces changes in the explanatory variable but otherwise has no direct effect on the dependent variable, allowing one to uncover the causal effect of the explanatory variable on the dependent variable. Under certain IV models, one can obtain bounds for counterfactual means. So how would one pursue decision-making when faced with partial identification? The rest of the article offers a comprehensive view of individualized decision-making under partial identification as well as several novel solutions to various decision-and policy-making strategies.
1.1. Introduction. An optimal decision rule provides a personalized action/treatment strategy for each participant in the population based on one's individual characteristics. A prevailing strand of work has been devoted to estimating optimal decision rules (Athey & Wager, 2021;Murphy, 2003;Murphy et al., 2001;Qian & Murphy, 2011;J. M. Robins, 2004;Zhao et al., 2012, and many others); we refer Individualized decision-making under partial identification 3 to Chakraborty and Moodie (2013), Kosorok and Laber (2019), and Tsiatis et al. (2019) for an up-to-date literature review on this topic.
Recently, there has been a fast-growing literature on estimating individualized decision rules based on observational studies subject to potential unmeasured confounding (Cui & Tchetgen Tchetgen, 2021a, 2021cHan, 2019Han, , 2020Han, , 2021Kallus et al., 2019;Kallus & Zhou, 2018;Qiu et al., 2021aQiu et al., , 2021bYadlowsky et al., 2018;. In particular, Cui and Tchetgen Tchetgen (2021c) pointed out that one could identify treatment regimes that maximize lower bounds of the value function when one has only partial identification through an IV. Pu and Zhang (2021) further proposed an IVoptimality criterion to learn an optimal treatment regime, which essentially recommends the treatment for patients for whom the estimated conditional average treatment effect bound covers zero based on the length of the bounds, that is, based on the left panel of Figure 1. See more details in Tchetgen Tchetgen (2021a, 2021c) and Zhang and Pu (2021).
In this article, we provide a comprehensive view of individualized decision-making under partial identification through maximizing the lower bounds of the value function. This new perspective unifies various classical decision-making strategies in classical decision theory. Building on this unified framework, we also provide a novel minimax solution (for socalled opportunists who are unwilling to lose) for individualized decision-making and policy assignment. In addition, we point out that there is a mismatch between different optimality results, that is, an 'optimal' rule that attains one criterion does not necessarily attain the other. Such mismatch is a distinctive feature of individualized decision-making under partial identification, and therefore makes the concept of universal optimality for decisionmaking under uncertainty ill-defined. Lastly, we provide a paradox to illustrate that a nonindividualized decision can conceivably lead to an outcome superior to an individualized decision under partial identification. The provided paradox also sheds light on using IV bounds as sanity check or policy improvement.
To conclude this section, we briefly introduce notation used throughout the article. Let Y denote the outcome of interest and A ∈ {−1, 1} be a binary action/treatment indicator. Throughout it is assumed that larger values of Y are more desirable. Suppose that U is an unmeasured confounder of the effect of A on Y . Suppose also that one has observed a pretreatment binary IV Z ∈ {−1, 1}. Let X denote a set of fully observed pre-IV covariates. Throughout, we assume the complete data are independent and identically distributed realizations of (Y, X, A, Z, U ); thus the observed data are (Y, X, A, Z).

A brief review of optimal decision rules with no unmeasured confounding
An individualized decision rule is a mapping from the covariate space to the action space {−1, 1}. Suppose Y a is a person's potential outcome under an intervention that sets A to value a, Y D(X) is the potential outcome under a hypothetical intervention that assigns A according to the rule D, that is, Individualized decision-making under partial identification 4 the value function (Qian & Murphy, 2011), and I{·} is the indicator function. Throughout the article, we make the following standard consistency and positivity assumptions: (1) For a given regime D, Y = Y D(X) when A = D(X) almost surely. That is, a person's observed outcome matches his/her potential outcome under a given decision rule when the realized action matches his/her potential assignment under the rule; (2) We assume that Pr(A = a|X) > 0 for a = ±1 almost surely. That is, for any observed covariates X, a person has an opportunity to take either action.
We wish to identify an optimal decision rule D * that admits the following representation, that is, (1) A significant amount of work has been devoted to estimating optimal decision rules relying on the following unconfoundedness assumption: The assumption essentially rules out the existence of an unmeasured factor U that confounds the effect of A on Y upon conditioning on X. It is straightforward to verify that under Assumption 1, one can identify the value function E[Y D(X) ] for a given decision rule D. Furthermore, the optimal decision rule in Equation (1) is identified from the observed data |X) denotes the conditional average treatment effect (CATE). As established by Qian and Murphy (2011), learning optimal decision rules under Assumption 1 can be formulated as where Pr(A|X) is the probability of taking A given X.  proposed to directly maximize the value function over a parametrized set of functions. Rather than maximizing the above value function, Rubin and van der Laan (2012), , and Zhao et al. (2012) transformed the above problem into a weighted classification problem, The ensuing classification approach was shown to have appealing robustness properties, particularly in a randomized study where no model assumption on Y is needed.

Instrumental variable with partial identification
In this section, instead of relying on Assumption 1, we allow for unmeasured confounding, which might cause biased estimates of optimal decision rules. Let Y z,a denote the potential outcome had, possibly contrary to fact, a person's IV and treatment value been set to z and a, respectively. Suppose that the following assumption holds: This assumption essentially states that together U and X would in principle suffice to account for any confounding bias. Because U is not observed, we propose to account for it when a valid IV Z is available that satisfies the following standard IV assumptions (Cui & Tchetgen Tchetgen, 2021c): Assumption 6. (IV positivity) 0 < Pr (Z = 1|X) < 1 almost surely.
Assumptions 3-5 are well-known IV conditions, while Assumption 6 is needed for nonparametric identification (Angrist et al., 1996;Greenland, 2000;Hernan & Robins, 2006;Imbens & Angrist, 1994). Assumption 3 requires that the IV is associated with the treatment conditional on X. Note that Assumption 3 does not rule out confounding of the Z-A association by an unmeasured factor, however, if present, such factor must be independent of U . Assumption 4 states that there can be no direct causal effect of Z on Y not mediated by A. Assumption 5 states that the direct causal effect of Z on Y would be identified conditional on X if one were to intervene on A = a. Figure 2 provides a graphical representation of Assumptions 4 and 5.
A U Y X Z Figure 2. A causal graph with unmeasured confounding. The bi-directed arrow between Z and A indicates the possibility that there may be unmeasured common causes confounding their association.
While Assumptions 3-6 together do not suffice for point identification of the counterfactual mean and average treatment effect, a valid IV, even under minimal four assumptions, can partially identify the counterfactual mean and average treatment effect, that is, lower and upper bounds might be formed. Let L −1 (X), U −1 (X), L 1 (X), U 1 (X) denote lower and upper bounds for E (Y −1 |X) and E (Y 1 |X); hereafter, we consider lower and upper bounds Individualized decision-making under partial identification respectively; sharp bounds for E (Y 1 − Y −1 |X) in certain prominent IV models have been shown to take such a form, see for instance Robins-Manski bound (Manski, 1990;J. Robins, 1989), Balke-Pearl bound (Balke & Pearl, 1997), Manski-Pepper bound under a monotone IV assumption (Manski & Pepper, 2000) and many others. Here, we consider the following conditional Balke-Pearl bounds (Cui & Tchetgen Tchetgen, 2021c) for a binary outcome as our running example. Let p y,a,z,x denote Pr(Y = y, A = a|Z = z, X = x), and Additionally, one could proceed with other partial identification assumptions and corresponding bounds. We refer to references cited in Balke and Pearl (1997) and a review paper by Swanson et al. (2018) for alternative bounds.
We conclude this section by providing multiple settings in real life where an IV is available but Assumption 1 is not likely to hold: 1) In a double-blind placebo-randomized trial in which participants are subject to noncompliance, the treatment assignment is a valid IV; 2) Another classical example is that in sequential, multiple assignment, randomized trials (SMARTs) in which patients are subject to noncompliance, the adaptive intervention is a valid IV. We note that the later proposed randomized minimax solution in Section 5.3 offers a promising strategy for this setting; 3) In social studies, a classical example is estimating the causal effect of education on earnings. Residential proximity to a college is a valid IV. We will further elaborate the third example in the next section.

A real-world example
In this section, we first consider a real-world application on the effect of education on earnings using data from the National Longitudinal Study of Young Men (Card, 1993;Okui et al., 2012;Tan, 2006;Wang et al., 2017;Wang & Tchetgen Tchetgen, 2018), which consist of 5,525 participants aged between 14 and 24 in 1966. Among them, 3,010 provided valid education and wage responses in the 1976 follow-up. Following Tan (2006) and Wang and Tchetgen Tchetgen (2018), we consider education beyond high school as a binary action/treatment (i.e., A). A practically relevant question is the following: Which students would be better off starting college to maximize their earnings?
In this study, there might be unmeasured confounders even after adjusting for observed covariates, for example, unobserved preferences for education levels might be an unmeasured confounder that is likely to be associated with both education and wage. We follow Card (1993), Wang et al. (2017), and Wang and Tchetgen Tchetgen (2018) and use presence of a nearby four-year college as an instrument (i.e., Z). In this data set, 2,053 (68.2%) lived close to a four-year college, and 1,521 (50.5%) had education beyond high school. To illustrate the IV bounds with binary outcomes, we follow Wang et al. (2017) and Wang and Tchetgen Tchetgen (2018) to dichotomize the outcome wage (i.e., Y ) at its median, that is 5.375 dollars per hour. While we only use this as an illustrating example, we note that dichotomizing earnings might affect decision-making, and therefore in practice one might conduct a sensitivity analysis around the choice of cut-off. Following Wang and Tchetgen Tchetgen (2018), we adjust for age, race, father and mother's education levels, indicators for residence in the south and a metropolitan area and IQ scores (i.e., X), all measured in 1966. Among them, race, parents' education levels, and residence are included as they may affect both the IV and outcome; age is included as it is likely to modify the effect of education on earnings; and IQ scores, as a measure of underlying ability, are included as they may modify both the effect of proximity to college on education, and the effect of education on earnings.
We use random forests to estimate the probability of p y,a,z,x (with default tuning parameters in Liaw & Wiener, 2002) and then construct estimates of Balke-Pearl bounds L −1 (X), U −1 (X), L 1 (X), U 1 (X), L (X), U (X). To streamline our presentation, we consider the subset of individuals of age 15, parents' education level 11 years, non-Black, and residence in a non-south and metropolitan area. Their IV CATE and counterfactual mean bounds L(X), U(X), L −1 (X), U −1 (X), L 1 (X), U 1 (X) are presented in Figure 3.
The shape of IV bounds looks similar to the slot machine example of Figure 1 given at the beginning of the article. When faced with uncertainty, what are different decision-making strategies? In the next section, we provide a new perspective of viewing optimal decisionmaking under partial identification beyond just looking at contrast or value function. Except for the real-world example, for pedagogical purposes, we focus on the population level of IV bounds instead of their empirical analogs throughout.

The lower bound perspective: A unified criterion
In Section 5.1, we link the lower bound framework to well established decision theory from an investigator's perspective. In Section 5.2, we extend our framework to take into account individual preferences of participants. In Section 5.3, we provide a formal solution Individualized decision-making under partial identification 8 Figure 3. IV CATE and counterfactual mean bounds for two subjects with IQ scores 84.00 and 102.45, where A = 1 and −1 refer to education beyond high school or not, respectively.
to achieve a minimax regret goal by leveraging a randomization scheme. In Section 5.4, we reveal a mismatch between deterministic/randomized minimax regret and maximin utility, and conclude that there is no universal concept of optimality for decision-making under partial identification. where w(x) can depend on D(x), 0 ≤ w(x) ≤ 1, for any x , is denoted by D opt . The derivation of lower bounds of E[Y D(X) ] is provided in the Appendix. Hereinafter, we refer to reasoning decision-making strategy from D opt as the lower bound criterion, where, as can be seen later, w(x) reflects the investigator's preferences.
In Table 1, we provide examples of decision-making criteria that have previously appeared in classical decision theory and we connect each such criterion to a corresponding w(x). Hereafter, for a rule D, we formally define utility as value function E[Y D(X) ] and regret as We give the formal definition of each rule in Table 1 except that the mixed strategy is deferred to Section 5.3. In the following definitions, min or max without an argument is taken with respective For example, for the left panel of Figure 3, maximax utility criterion recommends A = 1; maximin utility criterion recommends A = −1; minimax regret criterion recommends A = −1.
Notably, all criteria in Table 1 reduce to D * under point identification. For a more complete treatment of decision-making strategies and formal axioms of rational choice, we refer to Arrow and Hurwicz (1972). Interestingly, we note that a (deterministic) minimax regret criterion coincides with Hurwicz criterion with α = 1/2 as L (X) = L 1 (X) − U −1 (X) and U (X) = U 1 (X) − L −1 (X).
Maximin utility (pessimist) Remark 1. While both lower bound criterion and Hurwicz criterion have an index, they are conceptually and technically different. The index w(x) being a number between 0 and 1 refers to the preference of actions; with w(x) being a weighted average of I(P < Q) and I(P > Q), the lower bound criterion balances pessimism and optimism; however, it may not be straightforward for Hurwicz criterion to balance preferences on treatments/actions.

5.2.
Incorporating individualized preferences: numeric/symbolic/stochastic inputs. We note that the lower bound criterion also sheds light on the process of data collection for individualized decision-making. As individuals in the population of interest may ultimately exhibit different preferences for selecting optimal decisions, it may be unreasonable to assume that all participants share a common preference for evaluating optimality of an individualized decision rule under partial identification. An investigator might collect participants' risk preferences over the space of rational choices to construct an individualized decision rule. Therefore, we use the subscript r (a participant's observed preference) to remind ourselves that w r (x) depends not only on x but also on an individual's risk preference, that is, r ∈ R determines a specific form of w r (x) (see Table 1), where R is a collection of different risk preferences. Such w r (x) results in a decision rule D(w r (x), x) depending on both x (standard individualization, e.g., in the sense of subgroup identification) and r (individualized risk preferences when faced uncertainty), where r can be collected from each individual.
Remark 2. We note that part of the elegance of this lower bound framework is that the risk preference does not come into play if there is no uncertainty about optimal decision, Remarkably, the recorded index w r (x) for each x could be numeric/symbolic/stochastic, that is, fall into any of the following three categories, while the participants only need to specify a category and input a number between 0 and 1 if the first two categories are chosen: • Treatment/action preferences: Input a number β between 0 and 1 which indicates preference on treatments/actions with larger β in favor of A = 1. Here, w r (x) = β.
In observational studies, most applied researchers upon observing 0 ∈ (L(x), U(x)) would rely on standard of care (A = −1) and opt to wait for more conclusive studies, which corresponds to β = 0. In a placebo-controlled study with A = −1 denoting placebo, β = 0 represents concerns about safeness/aversion of treatment. • Utility/risk preferences: Input a number β between 0 and 1 and let symbolic input w r (x) = βI(P > Q) + [1 − β]I(P < Q), where β refers to the coefficient of optimism. For instance, β = 0 puts the emphasis on the worst possible outcome, and refers to risk aversion; and likewise β = 1/2, 1 refer to risk neutral and risk taker, respectively. • An option for opportunists who are unwilling to lose: Render w r (x) random as a Bernoulli random variable, see Section 5.3 for details.
We highlight that the proposed index w r (x) unifies various concepts in artificial intelligence, economics, and statistics, which holds promise for providing a satisfactory regime for each individual through machine intelligence.

A randomized minimax regret solution for opportunists.
In this section, we consider whether an investigator/participant who happens to be an opportunist can do better in terms of protecting the worst case regret than the minimax regret approach in Table 1.
An opportunist might not put all of his or her eggs in one basket. This mixed strategy is also known as mixed portfolio in portfolio optimization. Let p(x) denote the probability of taking A = 1 given X = x, by the definition of the minimax regret criterion, one essentially needs to solve the following for p(x), Such a choice of p * (x) guarantees the worst case regret no more than We formalize the above result in the following theorem.
Theorem 5.1. Define the stochastic policy D as D(x) = 1 with probability p * (x), the corresponding regret is bounded by In contrast, by only considering deterministic rules, a minimax regret approach guarantees the worst case regret for X = x which is no more than min(max{U(x), 0}, max{−L(x), 0}).

It is clear that
Therefore, the proposed mixed strategy gives a sharper minimax regret bound than Zhang and Pu (2021) and Pu and Zhang (2021), and therefore is sharper than any deterministic rules.
Remark 3. The result in this section does not necessarily rely on L(x) being defined as Remark 4. The proposed mixed strategy leads to w(x) or w r (x) a Bernoulli random variable with probability p * (x), and therefore a stochastic rule D(w(x), x) or D(w r (x), x) assigning 1 with probability p * (x). Note that w r (x) being a Bernoulli random variable with parameter p(x), and w r (x) being a scalar p(x) are fundamentally different: The former one provides a stochastic decision rule. In other words, participants with the same x can receive different recommendations; while the latter one leads to a deterministic rule. That is, all participants with the same x receive the same recommendation.

No universal optimality for decision-making under partial identification.
As can be easily seen from Table 1 as well as Section 5.3, there is a mismatch between deterministic/randomized minimax regret and maximin utility. In fact, each of the three rules corresponds to a different decision strategy. Such mismatch is a distinctive feature of partial identification. On the one hand, it is notable that {L(x), U(x)} provides complementary information to the analyst as it might inform the analyst as to when he/she might refrain from making a decision; mainly, if such an interval includes zero so that there is no evidence in the data as to whether the action/treatment is on average beneficial or harmful for individuals with that value of x. One might need to conduct randomized experiments in order to draw a causal conclusion if 0 ∈ (L(x), U(x)). On the other hand, the decision-making must in general be considered a game of four numbers {L 1 (x), L −1 (x), L(x), U(x)} rather than two, for example, From the above point of view, the concept of optimality of a decision rule under partial identification cannot be absolute, rather, it is relative to a particular choice of decisionmaking criterion, whether it is minimax, maximax, maximin, and so on. Furthermore, an individualized decision rule might incorporate participants' risk preferences as it might be unreasonable to assume everyone shares a common preference. In the Appendix, we provide expressions for the minimum utility, maximum regret, and maximum misclassification rate of certain 'optimal' rules in Table 1 (including maximin utility and deterministic/randomized minimax regret rules) for practical uses.

A paradox: 1+1<2
In this section, we provide an interesting paradox regarding the use of partial identification to conduct individualized decision-making. To streamline our presentation, we use (deterministic) minimax regret rule as a running example, however, any rule D ∈ D opt can suffer the same paradox. To simplify exposition, we consider the case with no U , that unbeknownst to the analyst, unmeasured confounding is absent. We consider the following model with covariate X (e.g., female/male) distributed on {0, 1} with equal probabilities, Pr(Y = 1|X, A) = X/16 + 1/5A + 1/15, Pr(A = 1|X, Z) = X/16 + 2/5Z + 1/2, With a slight abuse of notation, we use 0, 1 coding for Z, A here. It is easy to see that the optimal rule is D * = 1 for the entire population. After a simple calculation, the Balke-Pearl conditional average treatment effect bounds for X = 0, 1 both contain zero with |L(0)| < |U(0)| and |L(1)| > |U(1)|. The Balke-Pearl average treatment effect bounds marginalizing over X also contain zero and |L| < |U|.
As it is unbeknownst to the analyst whether unmeasured confounding is present or whether X is an effect modifier, there are several possible strategies for analyzing the data.
(1) If one is concerned about individualized decision-making but does not worry about unmeasured confounding, one runs a standard regression type analysis and gets the right answer.
(2) If one is concerned about unmeasured confounding but is only interested in decisionmaking based on the population level (i.e., based on average treatment effect analysis), one can obtain IV bounds on the average treatment effect and also get the right answer.
(3) If one is concerned about individualized decision-making and also worries about unmeasured confounding, one gets the wrong answer for a subgroup.
We summarize results of the above strategies of analyses in Table 2. Table 2. Correct/incorrect decisions using three types of data analyses.
As can be seen from the table, mixing up two very difficult domains (individualized recommendation + unmeasured confounding) might make life harder (1 + 1 < 2). There are several lessons one can learn from this paradox: a) A comparison between (1) and (3): It would be a good idea to first conduct a standard analysis (e.g., assume Assumption 1) or other point identification approaches (e.g., assume Assumption 7 of Cui & Tchetgen Tchetgen, 2021c) and then use IV bounds as a sanity check or say policy improvement; b) A comparison between (2) and (3): The paradox sheds light on the clear need for carefully distinguish variables used to make individualized decisions from variables used to address confounding concerns; similar to but different from Simpson's paradox, the aggregated and disaggregated answers can be opposite for a substantial subgroup. c) (3) by itself: It might be a rather risky undertaking to narrow down an interval estimate to a definite decision given the overwhelming uncertainty; overly accounting for unmeasured confounding might erroneously recommend a sub-optimal decision to a subgroup.
As motivated by the comparison between (1) and (3), we formalize the policy improvement idea following Kallus and Zhou (2018). Note that minimizing the worst-case possible regret against a baseline policy D 0 would improve upon those individuals for whom D 0 (X) = −1, L(X) > 0 and D 0 (X) = 1, U(X) < 0. We revisit the real data example in Section 4. We first run a standard analysis (random forest: Y on X, A) and obtain D 0 (X) = sign{Pr(Y |X, A = 1) − Pr(Y |X, A = −1)}; among 3,010 subjects, 2,106 have D 0 (X) = 1 and 904 have D 0 (X) = −1. Then we calculate IV conditional average treatment effect bounds, and there are 323 subjects with L(X) > 0 and 45 subjects with U(X) < 0. Then we use IV bounds as a sanity check/improvement: Only 4 subjects with D 0 (X) = −1 switch to 1, and 8 subjects with D 0 (X) = 1 switch to −1. Therefore, for most subjects in this application, the IV bounds do not necessarily invalidate the standard regression analysis, while IV bounds are still helpful to validate/invalidate decisions for a subgroup.

Discussion
In this article, we illustrated how one might pursue individualized decision-making using partial identification in a comprehensive manner. We established a formal link between individualized decision-making under partial identification and classical decision theory by considering a lower bound perspective of value/utility function. Building on this unified framework, we provided a novel minimax solution for opportunists who are unwilling to lose. We also pointed out that there is a mismatch between maximin utility and minimax regret. Moreover, we provided an interesting paradox to ground several interesting ideas on individualized decision-making and unmeasured confounding. To conclude, we list the following points that might be worth considering in future research.
• As the proper use of multiple IVs is of growing interest in a lot of applications including statistical genetics studies, one could possibly construct multiple IVs and then try to find multiple bounds to conduct a better sanity check or improvement.
Another possibility is to strengthen multiple IVs (Ertefaie et al., 2018;Zubizarreta et al., 2013). A stronger IV might provide a tighter bound, and therefore a sign identification may be achieved (Cui & Tchetgen Tchetgen, 2021b).
• Including additional covariates which are associated with A or Y for stratification and then marginalizing over these covariates would potentially give a tighter bound. Therefore, carefully choosing variables used to stratify (which can be the same as decision variables or a larger set of variables) might be of interest for both theoretical and practical purposes.
• The proposed minimax regret method by leveraging a randomization scheme and other strategies in Table 1 might be of interest in optimal control settings such as reinforcement learning and contextual bandit where exploitation and exploration are under consideration. In addition, given observational data in which a potential IV is available, one can use different strategies to construct an initial randomized policy for use in a reinforcement learning and bandit algorithm.
• One important difference between decision-making with IV partial identification and classical decision theory is the source of uncertainty. For the former one, unmeasured confounding creates uncertainty, and overthinking confounding might create overwhelming uncertainty. Therefore, to better assess the uncertainty, it would also be of great interest to formalize a sensitivity analysis procedure for point identification such as under assumptions of no unmeasured confounding or no unmeasured common effect modifiers (Cui & Tchetgen Tchetgen, 2021c). A similar question has also been raised by Han (2021).
Acknowledgments. The author is thankful to three referees, associate editor, and Editorin-Chief for useful comments, which led to an improved manuscript.
Appendix A. Derivation of lower bounds of value function The following was originally derived in Cui and Tchetgen Tchetgen (2021c). It is helpful to provide it here.