This piece1,2 is a commentary on the article, “The Age of Secrecy and Unfairness in Recidivism Prediction”
Rudin, Wang, and Coker (2020, henceforth RWC) present a convincing argument against black box algorithms like COMPAS that are sometimes used in the United States to help judges sentence convicted offenders in court. They point out that the lack of transparency means that defendants (and victims) cannot assess the accuracy of the score driving the decision, and researchers cannot accurately assess the fairness of any given decision rule. They then argue that these black box algorithms should be replaced with simple and transparent risk assessment algorithms that are based primarily on age and criminal history. Prior work has shown that these tools perform at least as well as the more costly proprietary tools.
Although the article is focused narrowly on proprietary risk tools like COMPAS, RWC’s argument potentially has a much broader scope. As they discuss in their final section, the RWC arguments that black box algorithms are not fair applies not only to COMPAS, but also to all discretionary sentencing done by judges. RWC state:
Interestingly, a system that relies only on judges—and does not use machine learning at all—has similar disadvantages to COMPAS; the thought processes of judges is (like COMPAS) a black box that provides inconsistent error-prone decisions. Removing COMPAS from the criminal justice system, without a transparent alternative, would still leave us with a black box.
Prior work has shown not only that judges are ‘black boxes.’ but that they also not very good at identifying high-risk offenders (Gottfredson, 1999). An extension of the RWC argument then, if I might be allowed to take the argument to an extreme that RWC did not advocate, would replace all black box algorithms, including judges, with a simple risk tool that uses age and criminal history to assign sentences.
The recognition by RWC that judges are also poorly performing black boxes connects the RWC discussion back to the sentencing reform movement that began to build momentum in the 1960s (Aharonson, 2013; Spohn, 2008). The U.S. sentencing system was then best described as an indeterminate sentencing structure focused on the goal of rehabilitation. Judges sentenced offenders to broad ranges, and parole boards made the final decisions about sentence length based on their assessment of an individual’s rehabilitation. This individualized and subjective approach was often criticized because it led to widely disparate outcomes that defendants were not able to contest. Criticism of this sentencing regime crystalized in a short but widely read book by Judge Marvin Frankel called Criminal Sentences: Law Without Order (Frankel, 1973). The current fairness argument against COMPAS by RWC is reminiscent of Frankel’s argument that the indeterminate system is nontransparent and nonreviewable and therefore inherently unfair. The main difference is that while RWC talks about one black box (COMPAS), Frankel talks about many different black boxes (judges).
Frankel’s solution was the creation of “a detailed chart or calculus to be used (1) by the sentencing judge in weighing the many elements that go into the sentence; (2) by lawyers, probation officers and others undertaking to persuade or enlighten the judge; and (3) by appellate courts in reviewing what the judge has done” (Frankel 1973, p. 113). The suggestion of a chart or “calculus” is similar to RWC’s call for a simple, transparent tool. The emphasis on risk, based primarily on prior history, also coincides nicely with the final result of the guidelines movement started by Frankel, who is sometimes called the Father of Guidelines (Adelman & Deitrich, 2008).
According to Harcourt (2015), the U.S. Sentencing Commission initially set out to produce an actuarial risk tool. Although they eventually abandoned this idea as too complex, the commission was heavily influenced by ideas about selective incapacitation, which developed after Frankel’s original book was published (Harcourt, 2015). Prior history was used by the commission specifically because of existing research linking prior history to risk (Robinson, 2001). As a result of this initial effort, prior history is a prominent feature of all state sentencing guideline systems, as well as many mandatory minimum sentencing schemes for chronic offenders.
RWC go beyond making prior history a prominent feature, and instead advocate a system that relies almost solely on prior history and age on the basis of a horse race between COMPAS and their preferred risk tool. While I agree wholeheartedly with RWC that transparent tools are better than black box ones, I believe that our experience with prior history over the last 30 years should raise some red flags about this recommendation. Harcourt (2015) identified the high correlation between race and prior history as one reason to be cautious. In the remainder of this discussion, I identify three other important concerns about sentencing policies that rely heavily on prior history.
First, prior history is an endogenous determinant of future criminal involvement. For example, judges and juries appear to use criminal history as a measure of culpability, particularly in marginal cases (Eisenberg & Hans, 2009). There is also now causal evidence not only that a criminal record can lead to problems in the labor market through the explicit action of the government (Chin, 2012), but that those problems can then lead directly to additional crime (Denver, Siwach, & Bushway, 2017). In this case, the prior record is not capturing some inherent risk of the individual, but the risk created through labeling by the criminal justice system. Although the full extent of this endogeneity is not yet well understood, the endogeneity itself is well established. Sentencing systems that focus on prior history run the risk of becoming self-fulfilling prophecies.
Second, the finding that prior history is the main factor predicting risk is the result of risk prediction exercises that do not account for the current penal treatment. Usually, risk prediction is conducted on a population prior to treatment (Bushway & Smith, 2007). However, in the case of the criminal justice system, risk prediction is conducted on a sample of people who have already been sentenced—that is, ‘treated’—differentially on the basis of risk. In other words, the data used by RWC to evaluate risk comes from a group of convicted offenders who have all been treated differently based both on COMPAS and on the judge’s assessment of risk. The problem with evaluating risk in an environment of differential treatment has long been understood to create endogeneity (Gottfredson, 1999) and flawed inference about the relevance of a given factor (Bushway & Smith, 2007). The RWC conclusion that their tool performs better than COMPAS should be modified to say that, conditional on the use of COMPAS and individual discretion, the proposed risk tool performed better than COMPAS. This does not mean that the two algorithms would have performed the same way if they had been implemented in two independent universes with the same initial conditions.
Consider the experience of Virginia. Virginia became one of the first states to use a transparent risk assessment instrument as a formal part of sentencing. In the initial risk assessment, created in a world without formal risk assessment, age was a major factor predicting risk (Kleiman, Ostrom, & Cheesman, 2007). In a risk analysis conducted after the new risk tool was implemented and used by the judges, age was no longer predictive—not because age does not predict risk, but because age was now being used to assign treatment, and the treatment apparently reduced recidivism (Kleiman et al., 2007; Bushway & Smith, 2007). The meaning of the comparison depends critically on the initial conditions.
It is extremely difficult to unwind the impact of these initial conditions, unless we know explicitly how sentencing was conducted. As RWC showed in their analysis of the ProPublica analysis, false assumptions about the nature of the sentencing process will lead directly to false conclusions about the consequences of that sentencing. Transparent risk tools could help make the sentencing process clearer. However, in most sentencing environments, risk tools are advisory, and judges are free to sentence as they choose (Kleiman et al., 2007; Stevenson, 2018).
In Virginia, the tool had some impact (albeit not the desired 25% decline in incarceration), but the impact of the tool varied by judge (Garrett, Jakubow, & Monahan, 2019; Monahan, Metz, & Garrett, 2018). Although sentencing research tends to assume that the same factors are used in the same way by all actors in the system (Mitchell, 2005), the reality is that each judge and courtroom work group is using its own weighting system (Gottfredson, 1999). As a result, it is virtually impossible to unpack the prior system, even in a world where a transparent risk instrument is available to the judges.
More generally, it appears to be impossible to achieve consensus about what the goals of sentencing should be. Indeed, there is clearly no consensus that risk should be the driving factor. For example, in the final proposed draft of the American Law Institute’s Model Penal Code (American Law Institute, 2017), proportionality/retribution is identified as the primary goal of sentencing. Recidivism prevention and other utilitarian goals are allowed on the margin, and individual judicial discretion is preserved as an important goal before transparency. The current article abstracts from a conversation about the role of transparency relative to other goals. In the conversation around the Model Penal Code, transparency was one of many goals. Fortunately, transparency can coexist with a variety of sentencing goals, including retribution, deterrence, and recidivism prediction. However, it cannot as easily coexist with an emphasis on individualized sentences or judicial discretion, key features of many discussions about ideal sentencing structures.
Third, and finally, a sentencing system that emphasizes prior criminal history can become an active cause for the growth in incarceration. For example, King (2019) has shown that in Minnesota, a state with a strict guideline system that emphasized prior history as a factor in sentencing, virtually all of the increase in an individual’s probability of incarceration in Minnesota from 1981 to 2013 can be attributed to an increase in prior criminal histories. More broadly, a short-term crime wave that differentially affects young people combined with a sentencing system that prioritizes prior history as a rule for punishment can lead directly to higher levels of criminal justice involvement over the life course for the birth cohort that came of age during the crime wave (Shen, Bushway, Sorensen, & Smith, 2019). Short-term crime waves will have fewer long-term consequences for the criminal justice system in sentencing environments that do not prioritize prior history.
The last 40 years of sentencing reform in the United States was launched primarily because of concerns about the lack of transparency and fairness in the dominate model of indeterminate sentencing (Frankel, 1973). The net result was the invention of new sentencing rules like sentencing guidelines, mandatory minimums, and truth-in-sentencing mandates that dramatically reduced judicial discretion. These rules reduced disparity in sentences, but at the same time, these new, less discretionary rules were often used to dramatically increase incarceration rates (National Research Council, 2014). Yet, there is no necessary connection between a fairer, more transparent system and harsher sentences. The Minnesota sentencing guidelines, which, unlike the federal guidelines, were designed specifically to limit prison growth, showed that guidelines could create a relatively simple, transparent sentencing system that reduced disparity without dramatically increasing incarceration rates (Frase, 2005).
The RWC article makes an argument that transparent risk tools can tackle the same problems of lack of transparency and disparity originally diagnosed by Frankel (1973). However, the experience of guidelines should have taught us that the devil is in the details, and that substantive sentencing rules, particularly those that reduce individual discretion, can affect overall incarceration as well as transparency and fairness. In this respect, RWC’s emphasis on transparency distracts the reader from the nature of the rules themselves. All else equal, I think we can all agree with RWC that transparent rules are better than black box ones. But, the ultimate prescription for a risk tool that relies on criminal history and age has many consequences, many of which are rather opaque. Even if we agree that tools and rules should be simple and transparent, we need not agree that they should be based on prior history and age (Frase and Roberts, 2020). For example, the creators of the Model Penal Code have a very different view on what should guide sentencing.
Previous work has noted that a reliance on criminal history might create or perpetuate racial disparities (Harcourt, 2015). In my opinion, there are also deeper, less well understood problems with a reliance on criminal history. These include the fact that the prominent use of prior history can create new criminal behavior, as well as exacerbate the impact of short-term crime waves on the size of the criminal justice system. Moreover, the very fact that prior history is so correlated with recidivism could be an artifact of the current system, rather than an underlying truth about human behavior that truly identifies risk. The inherently black box nature of our current sentencing structures makes it very difficult to unpack the true meaning of any risk assessment conducted on data generated from our current sentencing structure, and should make us deeply skeptical about any policy that places all of our eggs in one basket. This is particularly the case when that basket—risk reduction—is just one of the many potentially relevant goals of sentencing.
Read invited commentary by:
Alexandra Chouldechova (Carnegie Mellon University)
Sarah Desmarais (North Carolina State University)
Brandon L. Garrett (Duke University School of Law)
Eugenie Jackson and Christina Mendoza (Northpointe, Inc.)
Greg Ridgeway (University of Pennsylvania)
Read a rejoinder by: Cynthia Rudin, Caroline Wang, and Beau Coker
Adelman, L., & Deitrich, J. (2008). Marvin Frankel's mistakes and the need to rethink federal sentencing, Berkeley Journal of Criminal Law, 13:239-260. https://doi.org/10.2139/ssrn.1393469
Aharonson, E. (2013). Determinate sentencing and American exceptionalism: The underpinnings and effects of cross-national differences in the regulation of sentencing discretion. Law and Contemporary Problems, 76,161–187. Retrieved from https://scholarship.law.duke.edu/lcp
American Law Institute. (2017). Model Penal Code, proposed final draft. Retrieved from https://robinainstitute.umn.edu/publications/model-penal-code-sentencing-proposed-final-draft-approved-may-2017
Atwood, M. (2015). Morning in the Burned House. New York, NY: Houghton Mifflin Harcourt.
Bushway, S., & Smith, J. (2007). Sentencing using statistical treatment rules: What we don’t know can hurt us. Journal of Quantitative Criminology, 23, 377–387. https://doi.org/10.1007/s10940-007-9035-1
Chin, G. J. (2012). The new civil death: Rethinking punishment in the era of mass conviction. University of Pennsylvania Law Review, 160, 1789–1833. Retrieved from https://scholarship.law.upenn.edu/penn_law_review/
Denver, M., Siwach, G., & Bushway, S. (2017). A new look at the employment and recidivism relationship through the lens of a criminal record. Criminology, 55(1), 174–204. https://doi/org/10.1111/1745-9125.12130
Eisenberg, T., & Hans, V. P. (2009). Taking a stand on taking the stand: The effect of a prior criminal record on the decision to testify and on trial outcomes. Cornell Law Review, 94, 1353–1390. https://doi.org/10.2139/ssrn.998529
Frankel, M. (1973). Criminal sentences: Law without order. New York, NY: Hill and Wang.
Frase, R. (2005). Sentencing guidelines in Minnesota, 1978–2003. Crime & Justice, 32:131- 219. https://doi.org/10.1086/655354
Frase, R. & Roberts, J. (2020). Paying for the Past: The Case Against Prior Record Sentence Enhancements. New York, NY: Oxford University Press.
Garrett, B., Jakubow, A., & Monahan, J. (2019). Judicial reliance on risk assessment in sentencing drug and property offenders: A test of the treatment resource hypothesis. Criminal Justice and Behavior 46:799–810. https://doi.org/10.1177/0093854819842589
Gottfredson, D. (1999, November). Effects of judges sentencing decisions on criminal careers. National Institute of Justice Research in Brief. Washington, D.C.; U.S. Department of Justice. https://doi.org/10.1037/e513192006-001
Harcourt, B. (2015). Risk as a proxy for race: The dangers of risk assessment. Federal Sentencing Reporter, 27, 237–243. https://doi.org/10.1525/fsr.2015.27.4.237
King, R. D. (2019). Cumulative impact: Why prison sentences have increased. Criminology, 57(1), 157–180. https://doi.org/10.1111/1745-9125.12197
Kleiman, M., Ostrom, B., & Cheesman, F. (2007). Using risk assessment to inform sentencing decisions for nonviolent offenders in Virginia. Crime and Delinquency, 53(1), 106–132. https://doi.org/10.1177/0011128706294442
Monahan, J., Metz, A., & Garrett, B. (2018). Judicial appraisals of risk assessment in sentencing. Behavioral Sciences and the Law, 36, 565–575. https://doi.org/10.1002/bsl.2380
Mitchell, O. (2005). A meta-analysis of race and sentencing research: Explaining the inconsistencies. Journal of Quantitative Criminology, 21, 439–466. https://doi.org/10.1007/s10940-005-7362-7
National Research Council. (2014). The growth of incarceration in the United States: Exploring causes and consequences. J. Travis, B. Western, & F. S. Redburn (Eds.). Washington, DC: National Academies Press. https://doi.org/10.17226/18613
Robinson, P. (2001). Punishing dangerousness: Cloaking preventive detention as criminal justice. Harvard Law Review, 114, 1429-1458. https://doi.org/10.4324/9781315258089-8
Rudin, C., Wang, C., & Coker, B. (2020). The age of secrecy and unfairness in recidivism prediction. Harvard Data Science Review, 2(1).
Spohn, C. (2008). How do judges decide? The search for fairness and justice in punishment (2nd ed.). Thousand Oaks, CA: Sage Publications. https://doi.org/10.4135/9781452275048
Shen, Y., Bushway, S. D., Sorensen, L., & Smith, H. L. (2019). Locking up my generation: Cohort differences in prison spells over the life course. Working Paper, University at Albany.
Stevenson, M. (2018). Assessing risk assessment in action. Minnesota Law Review, 103, 303-384. https://doi.org/10.2139/ssrn.3016088
Stith, K., & Cabranes, J. (1998). Fear Of judging: Sentencing guidelines in the federal courts. Chicago, IL: University of Chicago Press.
This article is © 2020 by Shawn D. Bushway. The article is licensed under a Creative Commons Attribution (CC BY 4.0) International license (https://creativecommons.org/licenses/by/4.0/legalcode), except where otherwise indicated with respect to particular material included in the article. The article should be attributed to the authors identified above.