The greatest tool is the hardest to use. Machine learning has earned its title as “the most important general-purpose technology of our era” (Brynjolfsson & McAfee, 2017)—but it is notoriously difficult to launch. Outside a handful of leading companies, machine learning initiatives routinely fail to deploy and improve business operations. On the one hand, the world has become enamored with machine learning as the impressive, advanced technology that it is. On the other, this has placed a disproportionate focus on the technical capability, shortchanging what should be the primary focus: its value for the business. Since machine learning’s value is captured only by deploying it to enact operational change, the change must be proactively managed like any other—but leaders often neglect to do so. In this article, I illustrate some key drivers of internal resistance to such change and how one company overcame them. UPS achieved broad cooperation in overhauling its entrenched package delivery–planning process with machine learning, saving millions of driving miles.
Keywords: machine learning deployment, machine learning leadership, machine learning management, machine learning project success, business analytics, data science
So close and yet so far. Machine learning projects often stop just shy of achieving business value (Davenport & Malone 2021). The data scientist generates a viable predictive model, but it fails to deploy—the organization struggles to integrate it to improve operations (Siegel, 2022). Ironically, in the commercial use of this technology, the ‘rocket science’ has turned out to be the easy part. The challenge is getting it launched (Siegel, 2018).
Like so many pioneers of optimization who strive to help the enterprise, Bill Scherer has experienced his share of surprising, disappointing opposition to machine learning’s liftoff. A University of Virginia systems engineering professor, Bill delivers optimization—and yet, the world will not always accept his gifts. He once arranged a project for one of his students to optimize the trash collection truck routes for a rural county within Virginia's Shenandoah Valley. With some virtuoso number crunching, his student discovered that there was more room for improvement than anyone had imagined. She showed that the operation could do its job with only half as many garbage trucks driving each day.
Bill and his student met with the trash collection officials to excitedly present their newfound opportunity to improve efficiency. At the project's inception, the officials had been cooperative, volunteering their time and data to see just what these ‘university data wizards’ might find. Bill expected this unveiling to be met with oohs, ahhs, and perhaps some applause.
Instead, the officials said, ‘Oh no, we could never change our driving routes.’
It is not uncommon for data scientists to encounter what feels like obstinance—users stuck in their ways. The value of the model seems to be a no-brainer. The prescribed change to operations that would capture this value is mundane in comparison to the prowess of the advanced analysis.
But, if people are creatures of habit, it is often for a better reason than agents of change realize. The garbage truck drivers of the Shenandoah Valley had developed their routines not only as a means to get their work done but as a lifestyle. One visits their mother on the way home. Another swings by their favorite bakery.
Bill puts this kind of experience into perspective. “I emphasize to my students that it’s an imperative to consider the full context of the problem,” he says, “including the goals and associated values of all stakeholders” (personal communication, October 10, 2022).
Ultimately, data scientists can recognize the downside to disrupting the small regional processes of a community in the name of optimization, even when fossil fuel consumption and cold hard cash are on the line. On the other hand, when it comes to a nationwide operation, the sheer numbers exert a greater force of change—one that just may be powerful enough to teach old dogs new tricks. But as we will see, changing the habitual routines of employees requires some ‘tricks’ on the part of management.
In 2005, Jack Levis was in deep water at UPS. As the senior director of process management, he had overseen the development of a delivery-prediction model that would optimize the assignment of packages to delivery trucks. So far, trial runs of Package Flow Technology (PFT)—the system that deployed the model—had only delivered disappointment. “Things were really ugly internally,” Jack reflects. “It was a nightmare” (personal communication, August 5, 2021).
But this was not only an internal affair. The media had caught wind of it and blown the lid off. “New Package Flow Technology Not Delivering at UPS,” screamed a Computerworld headline. The feature story continued, “Its highly touted Package Flow Technology isn’t flowing as smoothly as expected, with problems at about a third of the 300 or so centers where it has been implemented” (Rosencrance, 2005).
Within an office shut off from the kerfuffle, the chief operating officer (COO) of UPS chastised Jack privately. It was a heated discussion and the fallout would reverberate for a long while. Even 2 years later when they were discussing the next project iteration, the COO looked Jack dead in the eye with the stare that only a Fortune 500 executive could muster. “I don’t want another Package Flow—don’t you dare do that” (personal communication, August 5, 2021).
But Jack had good reason to defend his innovation: The problems so far were not in the technology—they were in the humans. To deploy delivery prediction at UPS was to ask people to change their habitual routines and embrace a new paradigm. The story is so often the same: The deployment plan for machine learning was easier said than done (Bean, 2021).
Imagine that you run a typical UPS shipping center where 55 trucks leave every morning, each tasked with delivering 300 packages that day. Your job is to decide exactly how to distribute these 16,500 packages among the trucks—as illustrated in Figure 1—so that the overall operation requires as few miles and as few driver hours as possible. To complicate things, some deliveries are committed for a specific time of day, plus no driver’s shift can extend too long. No pressure.
Now multiply this problem by one thousand. The system you develop must handle these logistics every day for 1,000 shipping centers in the United States. Across this mammoth operation, every moment counts. One minute per driver per day costs $14.5 million per year. Likewise, one mile is worth $50 million. Really, though, no pressure.
Overall, millions of gallons of fuel and thousands of metric tons of emissions are on the line annually. Okay, maybe there is a little pressure.
But here is the real kicker: The system must work with incomplete information. Shipping centers must begin the lengthy process of planning and loading the trucks before all of tomorrow’s deliveries have become known. Many delivery destinations do not become apparent until the wee hours of the morning.
Jack calls this The Delivery Paradox (personal communication, July 31, 2021). You cannot optimally plan the truck loading until you know all the deliveries that will need to be made. But by the time you know all the deliveries, you have run out of time to load the trucks.
This enormously complicates the problem. After all, every package matters for the overall plan. If an unforeseen last-minute package shows up after the trucks are loaded, it could add miles to a truck’s existing plan. If you had known earlier, you might have distributed the packages completely differently among trucks. But you are out of time. The fully loaded trucks are headed out and redistributing the packages would take too long.
Jack recognized that The Delivery Paradox was a central dilemma since shipping centers faced a plethora of unforeseen packages every day. At the time, up to 30% of deliveries still were not in the system when the planning for the next day had to begin. This was because many packages that arrived on overnight flights had missing or only partial tracking information. Some shipping customers were late to upload data about their shipments or used noncompliant or glitchy systems to do so. Unexpected delays caused by factors like the weather could be slow to percolate. Throughout the long night of loading, some ‘dumb’ packages would even show up without proper coding, so handlers then had to manually enter the destination address on the spot.
Much of this information latency persists today—it is largely unavoidable, despite various improvements UPS has made to its systems. For example, suppose that a package will fly this afternoon from the West Coast to an East Coast shipping center for delivery tomorrow morning. If the center begins its planning at midday, it is still too early on the West Coast for the destination address to have been uploaded. As another example, even if all delivery addresses have been uploaded, the number of stops each truck will need to make, each costing precious time, is often unknown until the deliveries are actually made, since, for example, a large building or strip mall with multiple recipients could turn out to require multiple stops. On top of all this, some broad-strokes planning must be completed days ahead, for example, to book the right number of drivers.
In a system as complex as UPS's internal package network, uncertainty is an inherent predicament. The antidote is prediction.
Jack’s system at UPS, Package Flow Technology, predicts tomorrow’s package deliveries so that it can plan for them. These predicted deliveries augment the list of known packages, as shown in Figure 2.
PFT can form a complete plan with this augmented batch of delivery destinations and trigger the overnight loading process with time to spare. Trucks are typically loaded from about 4 a.m. to 7 a.m., so the planning must begin earlier—in the evening or even during daytime hours for some shipping centers.
Let us look at the mechanics of how this batch of delivery predictions is formed. First, a predictive model generates each individual prediction, one at a time. The model was generated with machine learning for this very purpose. It encodes patterns learned from past data that now serve to put odds on what will happen in the future.
To be specific, the model is applied repeatedly, performing its calculations for each possible delivery address, as illustrated in Figure 3. For the United States, that is 200 million predictions. Then, all the most probable destinations—say, those scored as being more than 80% likely—are combined with the list of known destinations.
PFT regularly updates these predictions—roughly every 2 minutes—until the trucks head out. Throughout the overnight loading process, some predicted deliveries become known, since the package actually shows up—plus, other unforeseen packages also come in. The system revises the plan accordingly. As a result, some packages may be moved from one truck to another, but most of the prediction-based plan remains intact without imposing time-consuming changes. By morning, any incorrectly predicted deliveries that never materialized as real packages are dropped from the plan. In the end, as the trucks head out with their packages, they no longer need delivery predictions—but the predictions are what got them well planned and fully loaded in time for that day’s expedition.
When people think of sexy machine learning projects, they do not usually think of optimizing brick and mortar logistics. But many projects that attract a lot of attention do little to transform business. They promise to deliver value in the long run, but so far they have enacted little to no change and they will not be moneymakers any time soon. Someday, fully autonomous cars will save countless lives, but impediments to their wide-scale deployment prevail, with some estimating that it will take decades (Chafkin, 2022). Likewise, IBM’s computer that defeated the humans on the quiz show Jeopardy! excited me in 2011 like no technology ever had—but its specialized skill does not readily generalize to practical tasks (Lohr, 2021). Generative AI systems, which are built with machine learning, generate images and text—often in such an adept and seemingly humanlike manner as to give you the impression that they embody an ‘understanding’ of human concepts and that they can express these concepts with language and images. Those particular capabilities may well prove valuable to the enterprise, but they are not designed to address operational efficiencies in the straightforward manner we are discussing here. And when machine learning conquers chess, Go, and complex video games, it impresses the best of us. But the real value comes only from world-changing deployment.
Instead, the most valuable machine learning projects directly improve established, large-scale business operations. The United Parcel Service has been a sturdy complement to the U.S. Postal Service for more than a century. It is the world’s largest courier, with higher revenue than even FedEx. This is not some hot new tech company. No, this is precisely the kind of dinosaur that runs society’s essential operations—entrenched processes begging to be streamlined, even while many in charge fight tooth and nail against change.
Jack’s job title flatly reaffirmed this point: senior director of process management at UPS. He was not VP of Machine Learning. In fact, his title did not name any technology whatsoever. Despite a growing trend that job titles name a technology—from data science to AI—he resolutely stuck with a title focused on the value. His focus was on the ends—process improvements—not the means. Having been at the company for a few decades, he was in charge of operations technology and oversaw six divisions. He was not among the echelons of executives whom he now had to convince for approval. He worked on operations directly, right where it counts. He was situated to enact real change personally.
But to reap the benefits of change, that change must be… executed! Even though it was driven by machine learning, the UPS project faced change-management challenges just the same as any other operational overhaul.
Out with the old, in with the new. PFT was designed to boost efficiency by replacing each shipping center’s legacy process with one that is more automated and centralized:
Legacy process: Each day, humans assign the delivery regions (Sequences) that each truck must cover. Much of these decisions come while loading the trucks, during which staff adjust the assignments in an ad hoc manner as they deem necessary. This sometimes means reassigning packages that have already been loaded, shifting them from one truck to another.
Updated process: The PFT system centralizes and semi-automates the assignment of Sequences to trucks, based largely on predicted deliveries. Just before truck loading begins, a planning manager completes final adjustments through a central PFT console in hopes that little to no further revisions will take place on the fly during the loading process.
If adopted fully, this process change would radically improve the efficiency of operations: It would decrease the mileage—and the time clocked by drivers—accumulated across the entire fleet of trucks. It could accomplish this because of two fundamental advantages over the legacy process. First, it dynamically incorporated the prediction of as-yet unknown deliveries in order to plan and begin loading the trucks early for on-time departures. Second, it centralized decision-making to be across all the shipping center’s trucks at once. This would beat the legacy process’s distributed decisions made by individual truck loaders on the fly while loading.
With the PFT system in place, these two advantages held even when managers made manual adjustments to the plan. When they did so, it was at a central console, with a bird’s-eye view across all the trucks going out that day. Furthermore, the console incorporated the day’s predicted deliveries along with known deliveries. As a manager revised the plan onscreen, it displayed the forecasted effect based on both known and predicted deliveries.
But the system had some ‘bugs’: the humans—in particular, those who were carrying out its instructions. If the staff loading the trucks overrode the centralized decisions too often, the benefits of PFT and delivery prediction would vanish. Changing a package’s truck assignment meant not only enacting a potentially suboptimal decision without the bird’s-eye perspective provided by the central console, but it could also mean inefficiently moving packages that were already loaded. This risked delaying trucks so they would not depart on time. Moreover, when staff vetoed the system and loaded a package onto another truck, they typically would not update the system. This meant the physical world and the digital world did not align—and that spelled trouble. Following the data on their handheld, one driver would go to deliver a package that was in actuality on another truck and the driver of that other truck would not even know they had the package. To address this misalignment, Jack formulated a new mantra for his staff: “The data is as important as the delivery” (personal communication, September 14, 2022).
This is where the teaching of new ‘tricks’ came in. With the legacy process, staff had applied their hard-earned knowledge and experience. If a seasoned truck loader saw a package with a delivery address that they recognized, they would reflexively say, ‘Oh, that’s got to go on the truck with this other package.’ To realize the potential gains in efficiency, Jack and his team would have to convince and reorient staff to follow a preordained plan while loading the trucks.
A key to success of [a value-creating change] process is dialogue between the people who will make the decision and those closest to the business who will be influenced by and expected to implement the decision. (Bodily & Allen, 1999, p. 3)
Organizational transformations are prone to failure… In order for transformation to be successful, leaders must approach it in ways designed to... drive emotional commitment from employees. (White et al., 2022)
Two-thirds of my effort was deployment, versus models and build with IT.
—Jack Levis (personal communication, September 20, 2022)
The efficacy of technology so often comes down to human adoption (Ross & Taylor, 2021). “The most difficult part of my job is not actually working with mathematicians to come up with a beautiful model to solve a problem,” lamented a keynote speaker years later, as he strode across the stage at Machine Learning Week 2022. As the conference chair, I had enlisted him as another leader from UPS, network planning and optimization director Yentai Wan (Wan, 2022). He continued, “The most difficult part of my job is actually deployment. It’s the so-called change management. How do I convince those end-users to switch from the legacy system and leverage the modernized technology we build out?”
With PFT’s effectiveness in question, Jack felt the heat. But he and his team still saw the same potential as always, even if the payoff was presently delayed. The problem was in the human piece, not the technical system. Jack and his team had underestimated the effort required to gain widespread buy-in and compliance. It was time to follow through in that effort, rather than back down.
So they doubled down on change-management efforts. At each shipping center, the training team would have to stick it out, refusing to leave until performance results were attained. Transferring knowledge was not enough. The center’s staff might be fascinated by the new system, but that excitement was often just a flash in the pan. Left on their own too quickly, they would return to old routines.
How do you reform stubborn creatures of habit? There are always sheer will and an iron fist. Big change requires some law enforcement. The team supervised, cajoled, and even micromanaged a bit. For example, loaders who struggled to break old habits were reassigned to new areas with which they were not familiar, where they would not recognize delivery addresses. You cannot take the knowledge out of a person, but you can take the person out of their domain of knowledge.
But babysitting and arm twisting only go so far. Rather than overly indulging the impulse to apply pressure, Jack’s team engaged, befriended, and enlisted. They adopted two building blocks of change management (Basford & Schaninger, 2016). First, they engaged in ‘fostering understanding and conviction’—by way of following through on the training process. Second, they mobilized by sharing the rewards of success—thus implementing the tactic of ‘reinforcing with formal mechanisms.’ In particular, they provided incentives structured in terms of short-term success, since improvements to bottom-line efficiency would take some time to materialize. “Because those early transition days are not necessarily profitable, we had to use a balanced scorecard that would reward managers who achieved leading indicators,” Jack explains. “If you’re doing these leading things that are in your control, how can the lagging indicators of dollars saved not follow?” (personal communication, September 20, 2022)
The team implemented scorecards that reported on staff adherence to the improved procedures, flagging when there were more than a small number of overrides or when drivers would have to wait for their truck to finish loading and depart late. Only after a passing grade would the shipping center ‘graduate’ and the training team leave. These metrics spoke well to the sensibilities of staff, sidestepping the commonplace data science mistake of instead leaning on more technical metrics. As Katie Malone put it, “The quantities that data scientists are trained to optimize, the metrics they use to gauge progress on their data science models, are fundamentally useless to and disconnected from business stakeholders without heavy translation” (Malone, 2020).
This performance-management tactic worked. It increased the ratio of centralized decisions that were made electronically to decentralized decisions made physically, on the fly. Off came the training wheels.
Achieving these quicker, incremental wins changed the conversation, gaining renewed support from the top. The budget and available resources nudged up and the training team grew to cover more shipping centers. A typical shipping center required five training personnel working onsite for many weeks. To meet the demands of this full-scale change-management process across the nation’s shipping centers, Jack's deployment team ultimately grew to about 450 (and later to 700 for the ORION deployment described below).
Package Flow Technology gained status as a success, credited with saving 85 million miles annually. The press was ready to congratulate rather than eviscerate. InformationWeek even placed the project atop its annual “20 Great Ideas to Steal” list (O’Neill, 2013).
This great gain came from the deployment of the package-prediction model in combination with other related improvements, such as centralizing a shipping center’s package-delivery decisions. Jack informally credits the predictive model itself with an estimated 10% to 25% of these wins, although it is hard to separate the contributions of mutually interdependent innovations.
Jack’s optimization work continued, expanding beyond truck-to-package assignments to also streamline driver routes with prescribed, turn-by-turn driving instructions. Not only did this directly improve the use of driver time and truck miles, it also improved how well the package-prediction model was being leveraged: Optimized package assignments (based on delivery prediction) gave each truck an area that the driver could potentially cover efficiently—prescribing the route meant they would actually do so.
This second optimization system, On-Road Integrated Optimization and Navigation (ORION), operates in conjunction with PFT. The overall efficiency gains compounded further, ultimately saving the company $350+ million, 185 million miles, 8 million gallons of fuel, and 185,000 metric tons of emissions per year (Siegel, 2017).
It is fair to say that Jack shot the moon. His work has received over a dozen industry awards and several high-profile TV and magazine spotlights.
For a machine learning project to deploy, the stars must be aligned; many pitfalls must be avoided. For example, predictive models must be sufficiently transparent and understandable in order to satisfy stakeholders (Siegel, 2021), decision makers must be ramped up on the particular metrics that report on model performance (Malone, 2020), and the model must be trained on representative data that reflects the scenarios in which it will be deployed (Mitchell, 1997).
But even when these core technical requirements are met, an overarching fundamental requirement often remains unmet: a strategy for managing the change that model deployment entails. By generating a predictive model, the technology creates potential value. An organization then captures that value only by acting on the model’s predictions, changing operations by incorporating the predictions into decision-making. For many machine learning deployments, this change is possible only with the cooperation of operational staff. Such change-management challenges are not new in general, but when it comes to machine learning projects, the need to shrewdly manage that change is often overlooked (Kruhse-Lehtonen & Hofmann, 2020). The advanced modeling algorithm itself absorbs much of the project’s attention and seems to promise the moon. Machine learning delivers a rocket, but those in charge still must command its launch.
Many at the University of Virginia Darden School of Business—where I hold a one-year position as Bodily Bicentennial Professor of Analytics—contributed significantly to the writing of the article. I am grateful to Samuel Bodily, the professor emeritus after whom my position is named, who personally provided a great deal of input and guidance. He and various faculty at Darden and other UVA departments provided feedback on an earlier draft of this article, including Bill Scherer, Eric Tassone, Michael Albert, Alex Cowan, Sasa Zorc, and Rupert Freeman. Thanks also to Jack Levis for providing detailed information for the UPS story covered herein. Finally, HDSR's Editor-in-Chief and anonymous reviewers also provided vital feedback that improved this article.
Eric Siegel has no financial or non-financial disclosures to share for this article.
Basford, T., & Schaninger, B. (2016, April 11). The four building blocks of change. McKinsey Quarterly.
Bean, R. (2021, February 05). Why is it so hard to become a data-driven company? Harvard Business Review. https://hbr.org/2021/02/why-is-it-so-hard-to-become-a-data-driven-company
Bodily, S., & Allen, M. (1999). A dialogue process for choosing value-creating strategies. Interfaces, 29(6), 16–28. https://doi.org/10.1287/inte.29.6.16
Brynjolfsson, E., & McAfee, A. (2017, July 18). The business of artificial intelligence. Harvard Business Review. https://hbr.org/2017/07/the-business-of-artificial-intelligence
Chafkin, M. (2022, October 5). Even after $100 billion, self-driving cars are going nowhere. Bloomberg. https://www.bloomberg.com/news/features/2022-10-06/even-after-100-billion-self-driving-cars-are-going-nowhere
Davenport, T., & Malone, K. (2021). Deployment as a critical business data science discipline. Harvard Data Science Review, 3(1). https://doi.org/10.1162/99608f92.90814c32
Kruhse-Lehtonen, U., & Hofmann, D. (2020). How to define and execute your data and AI strategy. Harvard Data Science Review, 2(3). https://doi.org/10.1162/99608f92.a010feeb
Lohr, S. (2021, July 16). What ever happened to IBM’s Watson? The New York Times. https://www.nytimes.com/2021/07/16/technology/what-happened-ibm-watson.html
Malone, K. (2020). When translation problems arise between data scientists and business stakeholders, revisit your metrics. Harvard Data Science Review, 2(1). https://doi.org/10.1162/99608f92.c2fc310d
Mitchell, T. (1997). Machine learning. McGraw Hill. http://www.cs.cmu.edu/~tom/mlbook.html
O'Neill, S. (2013, August 27). 20 great ideas to steal in 2013. InformationWeek. https://www.informationweek.com/government/20-great-ideas-to-steal-in-2013
Rosencrance, L. (2005, February 24). New Package Flow Technology not delivering at UPS. Computerworld. https://www.computerworld.com/article/2568868/new-package-flow-technology-not-delivering-at-ups.html
Ross M., & Taylor, J. (2021, November 10). Managing AI decision-making tools. Harvard Business Review. https://hbr-org.cdn.ampproject.org/c/s/hbr.org/amp/2021/11/managing-ai-decision-making-tools
Siegel, E. (2021, March 5). Explainable machine learning, model transparency, and the right to explanation. The Machine Learning Times. https://www.predictiveanalyticsworld.com/machinelearningtimes/explainable-machine-learning-model-transparency-and-the-right-to-explanation/12006/
Siegel, E. (2022, January 17). Models are rarely deployed: An industry-wide failure in machine learning leadership. KDnuggets. https://www.kdnuggets.com/2022/01/models-rarely-deployed-industrywide-failure-machine-learning-leadership.html
Siegel, E. (2018, October 5). Three common mistakes that can derail your team’s predictive analytics efforts. Harvard Business Review. https://hbr.org/2018/10/3-common-mistakes-that-can-derail-your-teams-predictive-analytics-efforts
Siegel, E. (2017, June 7). Wise practitioner – Predictive Analytics Interview Series: Jack Levis at UPS. The Machine Learning Times. https://www.predictiveanalyticsworld.com/machinelearningtimes/wise-practitioner-predictive-analytics-interview-series-jack-levi-ups/8699/
Wan, Y. (2022, June 19–24). Keynote: Machine learning makes UPS’s Smart Logistics Network smarter [Conference session]. In Predictive Analytics World for Business 2022, Las Vegas, NV. https://www.predictiveanalyticsworld.com/business/2022/agenda/#session98711
White, A., Smets, M., & Canwell, A. (2022, July 18). Organizational transformation is an emotional journey. Harvard Business Review. https://hbr.org/2022/07/organizational-transformation-is-an-emotional-journey
©2023 Eric Siegel. This article is licensed under a Creative Commons Attribution (CC BY 4.0) International license, except where otherwise indicated with respect to particular material included in the article.