Originally written as the opening editorial for HDSR’s Vine to Mind symposium brochure, this slightly updated version highlights the intertwined nature of wine, statistics, and the broader realms of data science and AI. Revisiting the transformative 1976 Judgment of Paris, a landmark blind tasting event in the history of wine, this article illustrates both the power and complexity of statistical evidence. Transitioning from “bottle shock” to “future shock,” as we collectively navigate the disorientation caused by the rapid rise of generative AI, this article reflects on the evolving roles of data science and AI in wine industry—tackling challenges such as climate change, enhancing the wine economy, and understanding consumer preferences. Ultimately, the narrative underscores the dual impact of technological advancements and the enduring communal nature of wine, inviting readers to contemplate their significance in human society and individual lived experiences.
Keywords: Judgment of Paris, variations, Vine to Mind, wine data, wine industry, wine tasting
Wine and celebration have long been intertwined throughout human history, exemplified by Winston Churchill's famous remark about Champagne: “In victory I deserve it; in defeat, I need it.” Thus, it is fitting that Harvard Data Science Review (HDSR), which aims to feature everything data science and data science for everyone, celebrated its 5th anniversary with a Vine to Mind symposium in June 2024. After all, everyone has an opinion on wine, especially those who don’t approve of it.
However, as a statistician who makes a living by navigating (extreme) variations, I also have professional reasons to choose this wine theme. The pedagogical reason came from the wine module of the Harvard’s general education course “Real Life Statistics: Your Chance for Happiness (or Misery)” that I designed and taught with the help of my “happy team,” a group of passionate PhD students (Lock & Meng, 2010). As I detailed in my first article in a wine magazine (Meng, 2023), I got into wine—or rather wine got into me— accidentally due to a wine club owner apparently not having proper education on risk assessment. The encounter provided me with ample German rieslings to conduct blind tasting to illustrate how to conduct experiments scientifically (e.g., blinding; randomization) and analyze their results statistically.
The research reason for choosing the wine theme is that any data science method that can handle wine data well would likely do well in general, because of the intoxicatingly complex nature of quantitatively studying wine. Wine quality and quantity is influenced by a myriad of factors ranging from macro-level climate changes to the micro-level Saccharomyces cerevisiae (brewer's yeast). Moreover, our actual wine experience is shaped by numerous elements: how a bottle is stored, aged, and transported; how the wine is served (e.g., decanting, choice of glassware, serving temperature); what the wine is paired with (food, company, and atmosphere); and so forth.
On several occasions I enjoyed a wine in a restaurant or friend’s house so profoundly that I ordered a case via my phone before my Uber ride got me home. When the case arrived, I eagerly anticipated reliving that exceptional experience, only to be very disappointed—why did I order a case? Whereas bottle-to-bottle variations may play a role, a more sober statistical inference is that our wine experiences can be significantly influenced by with whom we share a bottle (or two). As emotional beings, our judgments are almost instinctively swayed by how we feel, unless there are internally or externally imposed mechanisms to protect against their unduly influence. Therefore, May 24, 1976, is likely the most intoxicating day in the history of data science and, more broadly, empirical science. A group of oenophiles, including sommeliers, winemakers, and wine critics, gathered in Paris, France, to blind taste two flights: chardonnay and cabernet sauvignon. Each flight consisted of ten wines, six from California in the United States and four from Burgundy or Bordeaux in France. Judges were asked to rate, independently, all wines on a scale of from 1 to 20. The scores of the nine highly regarded French judges were then totaled and the wine with the highest score was declared the winner for each category. California’s Chateau Montelena and Stag’s Leap Wine Cellars, both from 1973, topped respectively the white and red wine categories (Taber, 2005).
The shock wave of the news went beyond the wine world, helped by the international media coverage, including TIME magazine’s article titled “Judgment of Paris" (1976) and later the Hollywood dramatization, Bottle Shock, directed by Randall Miller (2008). The shock occurred because until then, the widely held belief was that French wine was vastly superior to wine made anywhere else. The fact that the ratings were from French oenophiles and taken on French soil helped to address (legitimate) concerns that there could be experiential or cultural preferences that can defy blind tasting, such as one could suspect if the ratings were coming from oenophiles who had more experience with or preferred California wines.
As a statistician, I cannot help but muse over another shocking fact: a small set of subjective ratings carried the degree of persuasion that overpowered an essentially universally held belief, almost instantly, and with long-lasting effects. Are there any other such examples in human history? When could that happen again, where virtually the whole globe could be persuaded almost instantly by a single small study, correctly or incorrectly, against pervasive preconceived belief?
The power of persuasion came from the scientific aspects of the tasting’s design, which has key elements that any statistical textbook on clinical trials would convey: minimizing confounding factors (through blind tasting), controlling measurement variations (by using French oenophiles), and securing sufficient statistical power (via nine independent replications).
As George Taber—the only reporter who attended the tasting event despite the many who were invited—recounted in “A Stunning Upset” (Taber, 2005, Chapter 19), the implementation was not perfect. Some judges were checking with or teasing each other if they were tasting California or French wine. Ironically such exchanges, as overheard by Taber, who was given a list of the wines in the order of their being tasted, in this case helped to confirm the success of the blind tasting. As Taber wrote, “from their comments, though, I soon realized that the judges were becoming totally confused as they tasted the white wine” (p. 200).
The confusion ultimately led to “reaction ranged from shock to horror,” (p. 202) as Taber wrote in the same chapter, when the organizer of the event, Steven Spurrier, a British wine merchant, announced the result. Clearly this is not the place to go deeper into all the statistical lessons learned from Taber’s recount (e.g., the potential carryover effects on judges’ behaviors since the results for white were announced before tasting the red). But since the Vine to Mind symposium was about leveraging data science and AI technologies to improve wine making and wine economy, it is fitting to remind ourselves that data science done right brings both scientific and economic values. One could have had 900 judges, but if the tasting is not blinded, then the results would not be nearly as persuasive, if at all, as the Judgment of Paris, and the tasting would cost vastly more. Data quality trumps data quantity.
There is, however, another side of the 1976 story that has received little media attention, but also carries several critical lessons for data science. These lessons are especially timely as our society becomes increasingly data driven, and hence increasingly vulnerable to misinformation and disinformation.
Specifically, Spurrier’s way of analyzing the judge scores by adding up individual scores, while extremely common and practical, is much less principled statistically than the way the scores were collected. As Orley Ashenfelter (Vine to Mind opening keynote speaker) and Richard Quandt point out in their analysis of the individual scores (Ashenfelter & Quandt, 1999), the simple averaging or totaling, while on the surface appears to achieve fairness by treating every judge equally, is in effect permitting more influences by those judges who tend to use more extreme ratings to express their strong preference, because simple average or total is sensitive to outliers. Evidence of judges’ preferential uses of extreme values is presented in the individual scores as provided in Taber (2005).
For example, for chardonnay, all nine judges ranked 1973 David Bruce the worst, and yet their scores ranged from 8 to 0, even though 0 was not allowed—one judge must have really hated it! Incidentally, the complete agreement by nine judges on the worst wine suggested that the bottle for the tasting was spoiled (e.g., due to improper transportation), and not necessarily that the vintage is inferior in general. Taber’s observation that some judges dumped the wine into spit buckets after only smelling it (p. 201), yet the 1973 David Bruce had great reputation before the tasting (pp. 167–169) supports this inference. This incident reminds us that we should always think beyond the numbers, especially when they look (too) good—what’s wrong with all judges agreeing? It should remind us of the variability and vulnerability of empirical studies, and the care we must take to execute them. It is certainly disturbing that a single bottle could ruin the reputation of an entire vintage, but mis-ship or mishap happens.
However, how to properly analyze and summarize judges’ scores and what reliable conclusions one can draw from them are not trivial matters. Following Ashenfelter and Quandt’s (1999) analysis, which used a nonparametric approach with ranked data, multiple studies based on other methods were conducted. For example, Lindley (2006) took a full Bayesian approach, while Cicchetti (2004) followed a parametric but frequentist approach. They both arrived at some conclusions that are different from those of Ashenfelter and Quandt (1999). In particular, Cicchetti (2004, p. 214) wrote that the reported scores “give little credence to the title of Prial’s (2001) New York Times article: ‘The day California shook the world’.”
Whether this title was hype or not depends on the unwritten but critical part of the title—what shock did it refer to? Cicchetti (2004) pointed out correctly that statistically it’s not meaningful to distinguish the first ranked 1973 Stag’s Leap with the second ranked 1970 Chateau Mouton when their average scores are respectively 14.14 and 14.09, and each with (between-judge) standard deviation over 1.7. Lindley's (2006) Bayesian approach reached a similar conclusion, with his Bayesian probability that Stag’s Leap is better than Mouton being only 52%, nearly a toss-up.
These types of more nuanced investigations and results occur frequently in data science, especially for subjects involving many factors and with large variations, for which judging wine surely is at the top. Climate change is another subject with many factors and with an extremely low signal-to-noise ratio. The lower the signal-to-noise ratio, the harder it is to reveal the signal empirically. But in this case, we should all be thankful, because if the increase in average temperature over a decade is anywhere near the magnitude of the temperature variations we observe in our daily life, well, we will have no life to observe within days. Assessing climate change is surely more consequential than blind tasting wine; indeed, the impact of climate change on wine industry was one of the two main themes of the Vine to Mind symposium. But the statistical and data science insights and methods for separating revelatory variations (e.g., signal) from obfuscatory variations (e.g., noise) are fundamentally the same. For most scientific problems we do not have the luxury for ‘blind tasting,’ making it paramount for us to develop and test our insights and methods in areas we can do so. This is why particle physicists employ and promote ‘blind analysis,’ as overviewed in the HDSR article by physicists Thomas Junk and Louise Lyons (2020), to guard against their own human biases—the very same principle behind blind tasting.
Fortunately, in the case of the Judgment of Paris, the use of the sensational phrase ‘shock’ is statistically justifiable in one regard. All the analyses of the individual judge scores share one common finding, that is, for both red and white wines, there was one California wine that cannot be declared to be dominated by any French wine in a statistically meaningful way. And that is as shocking (or exciting) as finding a counter example to a mathematical conjecture that was once universally believed to be true. As Robert Parker emphasized correctly, “The Paris Tasting destroyed the myth of French supremacy and marked the democratization of the wine world. It was a watershed in the history of wine” (Taber, 2005, p. 211).
The 1976 shock wave was based on Spurrier’s simplistic summary instead of the more principled analyses, and the words spread quickly and have had global and lasting impact. In this case, luckily, the simplistic summary largely conveyed a correct message. Nevertheless, this historical incidence of how a small local experiment could quickly have a global implication should remind all of us of the criticality in ensuring reliable data, results, and communication, especially as we now live in a much more globally connected world than half a century ago, and it will be increasingly so as we move into future.
Speaking of future, HDSR just published a special issue on “Future Shock: Grappling With the Generative AI Revolution.” The term “Future Shock” is borrowed from the title of a 1970 book by sociologist Alvin Toffler, who used the term to capture the widespread societal dislocation effected by the rapid advent of the digital revolution, or more broadly “the dizzying disorientation brought on by the premature arrival of the future” (p. 19).
Whether generative AI amounts to a future shock as Toffler described is still a matter of debate (Leslie & Perini, 2024). But there is little doubt that generative AI is a disruptive technology with massive societal impact, positive and negative, as the “Future Shock” special issue captures and reflects upon. The Vine to Mind symposium explored the impact or potential impact on wine industry by both timely AI technologies and time-honored data science theory and methods, especially in economics and statistics.
None of us can confidently predict the future, but we can always contemplate it. What will turn out to be the key AI technologies for the wine industry? What kind of vineyards will more successfully integrate data science in their approach to sustainable high-quality winemaking? What are the new data science insights and lessons that can be learned from these new developments? And, just for the fun, will there ever be an AI taster, and if so, how well can it do with blind tasting? Will it be better than humans or will it too be humbled by the protean nature of wine?
To remind ourselves how hard it is to predict the future, the 1972 documentary Future Shock, directed by Alexander Grasshoff and based on Toffler’s book, may give us past shock. To duplicate Toffler’s definition, I will define ‘past shock’ as the dizzying disorientation brought on by the premature predictions of the past. The documentary discussed changes that were in the making or arriving soon, as it predicted to be so. These include a dizzying array: from gay marriages to group marriages, from artificial intelligence to artificial man, and from electric mood stimulator to medical IQ booster.
Luckily, the bottle I paired with watching the documentary was an excellent stabilizer, allowing me to turn my disorientation into contemplation as a future myself looking back. Would we provide past shocks to future generations as much as Toffler’s generation did to us? What opportunities did we miss because we were shortsighted? What risks did we take that were unwise? And most importantly, what foolish mistakes have we made, out of our overconfidence or greed, that the future me would have to consume a goliath before forgiving (or forgetting) the current me?
Regardless of how the future unfolds, there is one prediction I am willing to risk all my statistical reputation on (if I still have any left) to put in writing. Wine connects us, humbles us, and equalizes us. No matter who we are, or who we think we are, blind tasting will always make us nervous and vulnerable.
We will always be reminded of how incapable human brains are in processing and navigating variations, and how easily we are fooled by our overconfidence or prejudices. But above all, when the truth is revealed and we are proved to be wrong, and very wrong, there is no sadness, resentfulness, defensiveness, excuses, or even embarrassment. Instead, we laugh, we reflect, and most importantly, we immerse ourselves in the sheer joy of being silly together, learning together, and being completely honest with one another. Where else in life can we have such humbling yet joyful experience?
In vino we trust (and celebrate). For everything else, there is data science (and HDSR).
There are literally over a thousand people to thank for making HDSR possible, and for having published over 450 articles in its first 5 years. These include all board members; authors; reviewers; partners; as well as HDSR’s publisher, MIT Press; our editorial office; Harvard’s leadership, especially from the Office of the Provost and from the Harvard Data Science Initiative; and many supports and friends. The Vine to Mind symposium itself is another example of tremendous teamwork to all of whom I will raise a bottle—a glass is insufficient to express my gratitude—to say, “THANK YOU!” But a magnum is in order for Don St. Pierre, whom I had great fortune to meet in his “Wine Hotel” in Shanghai over a decade ago. Our shared passion for wine and education has formed an intoxicating journey together, as recounted in Meng (2023). The same to Karl Storchmann, the editor of Journal of Wine Economics, for making this symposium a reality, and with whom I am looking forward to many intoxicating ventures together. Last but the most, to Amara Deis, Elizabeth Langdon-Gray, and Rebecca McLeod, without whose dedications to handling many tasks and challenges, no wine could cheer me up, even those that are served at the symposium.
To all the participants from the wine industry who provide wine, thank you for helping to lubricate both the human and AI engines. For those who want to contribute but we must give a rain check to because there is only so much excitement a higher education institute can take in one year, let’s work together to ensure that there will be many Vine to Mind or Mind to Vine symposiums in years to come—after all, ‘symposium’ literally refers to ‘drinking together’ in ancient Greek society.
Xiao-Li Meng has no financial or nonfinancial disclosures to share for this editorial.
Ashenfelter, O., & Quandt, R. (1999). Analyzing a wine tasting statistically. Chance, 12(3), 16–20. https://doi.org/10.1080/09332480.1999.10542152
Cicchetti, D. V. (2004). Who won the 1976 blind tasting of French Bordeaux and US Cabernets? Parametrics to the rescue. Journal of Wine Research, 15(3), 211–220. https://doi.org/10.1080/09571260500109319
Grasshoff, A. (1972). Future Shock [Film]. Metromedia Producers Corporation.
Junk, T., & Lyons, L. (2020). Reproducibility and replication of experimental particle physics results. Harvard Data Science Review, 2(4). https://doi.org/10.1162/99608f92.250f995b
Judgement of Paris. (1976, June 7). TIME. https://web.archive.org/web/20151108131748/http://content.time.com/time/subscriber/article/0,33009,947719,00.html
Leslie, D., & Perini, A. M. (2024) Future shock: Generative AI and the international AI policy and governance crisis. Harvard Data Science Review, (Special Issue 4). https://doi.org/10.1162/99608f92.88b4cc98
Lindley, D. V. (2006). Analysis of a wine tasting. Journal of Wine Economics, 1(1), 33–41. https://doi.org/10.1017/S1931436100000079
Lock, K., & Meng, X.-L. (2010). Real-life module statistics: A happy Harvard experiment. In C. Reading (Ed.), Data and context in statistics education: Towards an evidence-based society: Proceedings of the Eighth International Conference on Teaching Statistics (ICOTS8, July, 2010), Ljubljana, Slovenia. International Statistical Institute. https://iase-web.org/documents/papers/icots8/ICOTS8_4D1_LOCK.pdf?1402524970
Meng, X.-L. (2023). Seeking simplicity in statistics, complexity in wine, and everything else in fortune cookies. Fondata. 3, 18–27.
Miller, R. (2008). Bottle shock [Film]. Shocking Bottle; Zin Haze Productions; Intellectual Properties Worldwide; Unclaimed Freight
Prial, F. J. (2001, May 9). The day California shook the world. New York Times. https://www.nytimes.com/2001/05/09/dining/wine-talk-the-day-california-shook-the-world.html
Taber, G. M. (2005). Judgment of Paris. Simon and Schuster.
Toffler, A. (1970). Future Shock. Bantam Books.
©2025 Xiao-Li Meng. This editorial is licensed under a Creative Commons Attribution (CC BY 4.0) International license, except where otherwise indicated with respect to particular material included in the editorial.