Column Editor's note: Games like basketball and football have had limited success using statistical analysis to improve game strategy and evaluate players. Brian Macdonald explains that the availability of detailed player tracking data in sports allows researchers and team analysts to dive into the deep end of data science.
Keywords: decision analysis, expected points, performance analysis, player tracking data, risk analysis, sports analytics
Data science has become more prominent in many industries in recent years, and sports is no different. The book and movie Moneyball, about how the 2002 Oakland Athletics used data analysis to rethink how to build a team and make in-game decisions, helped accelerate the adoption of data science in sports and helped popularize analytics. Fast forward to today and the data available are far more detailed than what was available to the Athletics in 2002, and the analytical methods are far more sophisticated. In several sports leagues, player-tracking data are available, where the locations of all players and the ball or puck are recorded several times per second throughout the game using camera-based or chip-based technologies.
For professional basketball and (American) football, the availability of player-tracking data is especially compelling. Because of the simultaneous movements of multiple agents and the dynamic spatial relationships and interactions among them, spatiotemporal information is essential to fully understand the evolution of a play in these sports.
For years, box-score statistics like points, rebounds, and assists in basketball, or passing yards, touchdowns, and interceptions in football were the standard. In the late 1990s and early 2000s, play-by-play data became available for many sports. These data contained basic information on important events that occurred during a game, including which players were involved, the time the events happened, and in some cases player substitution information, and opened the door to a variety of new analyses.
In football, for example, the idea of expected points (EP) became possible to develop based on play-by-play data. Analysts used data containing down, yards to go for a first down, yard line, and the number of points that were ultimately scored on that drive, to build EP models that provide an estimate of the number of points a team is expected to earn in the current drive, given the current game state (Burke, 2009; Carter & Machol, 1971). EP models serve as the foundation for a variety of analyses, such as determining whether it is best to punt, kick a field goal, or go for the 1st down when it is 4th and 1 on the opponent’s 40 yard line, as well as player metrics like Total Quarterback Rating (Oliver, 2011; Katz & Burke, 2016).
Because of advancements in technology and statistical and machine learning algorithms, data containing the exact locations of every player and the ball at regularly-spaced time points became possible. Teams and leagues started taking advantage of these capabilities beyond the information provided by play-by-play data. The technology used to collect these data depends on the league. For the National Basketball Association (NBA), for example, multiple cameras at fixed locations and orientations in the arena are pointed toward the playing surface to record the game play. Locations of the players and the ball, at a rate of 25 times per second, are extracted from those video feeds. For the National Football League (NFL), chips in shoulder pads and the ball are used to track locations 10 times per second. The locations of the players and the ball can then be used as the basis for detecting in-game actions such as screens, drives, offensive linemen blocks, and so on (Burke, 2018; Keshri et al., 2019; McQueen et al., 2014). Computer vision techniques can be used to extract data from broadcast video as well (Johnson, 2020). Basic information like running speeds and distances throughout a game that were previously not available can be extracted from tracking data. These new opportunities opened the door for more insightful analysis that deepened understanding of both games.
An important issue in free-flowing sports like basketball is how to assess a player’s positioning and decision making on offense and defense. Cervone et al. (2014, 2016) developed a method called the expected possession value (EPV) that addresses this question. Their approach estimates the expected number of points that a team will score by the end of the possession given the location information of the players and the ball throughout every moment of the possession. The EPV can be understood as a weighted average of the value associated with each possible decision the ball handler could make (pass, shoot, dribble, etc.), weighted by the probability that the ball handler will make that decision. The value associated with each possible decision, and the probability of that decision, are each modeled separately.
Several metrics can be derived using this framework. One can estimate the value of each localized action (a screen, a drive, finding open space on the court, etc.) that occurs during a possession by computing the expected points before and after that action. One can also evaluate the decision-making ability of a player by measuring how often the player made the ‘best’ decision according to EPV. Suppose a player becomes wide open under the basket. The EPV is high at that moment, but if the ball handler does not pass the ball to that open player before the defense recovers, the EPV decreases. An increase in EPV could be attributed to the player who found open space on the court, and a decrease in EPV could be attributed to the decision that the ball handler made to hold onto the ball, or the failure to even recognize the open teammate.
One metric proposed in Cervone et al. (2014, 2016) was Expected Possession Value Added (EPVA), which is a measure of how much EPV a player contributes, relative to an average player, based on all of his actions when he is handling the ball. A player who routinely has a positive EPVA can be understood as making good decisions. Unlike once-popular player measures based on play-by-play data, such as “Adjusted Plus-Minus” in basketball (Rosenbaum, 2004; Ilardi & Barzilai, 2008; Sill, 2010; Engelmann, 2017) and other sports (Clark et al., 2020; Gramacy et al., 2013; Macdonald, 2011; Matano et al., 2018; Sabin, 2020; Schuckers and Curro, 2013; Thomas et al., 2013), EPVA focuses on all micro-decisions and micro-actions taken by a player.
Sicilia et al. (2019) built an alternative model for within-play expected points that differed from the EPV model in two important ways. First, instead of focusing on expected points directly, they estimated the probabilities that one of four terminal actions (field goal attempt, shooting foul, nonshooting foul, and turnover) would occur, given the locations of the players and ball leading up to that moment in the possession. Each of the terminal actions has an associated point value, which enable the algorithm to convert these probabilities to expected points.
This approach enables an analysis of not only the expected points during a play, but also a more granular treatment of possible outcomes and the risk level associated with them. A play that has a higher expected point value associated with it would be preferred in typical game situations. But late in the game, when the team with possession has a lead, a play with a slightly lower expected point value, but a much lower probability of a turnover, for example, may be preferred. A team that has a lead sometimes prefers lower variance outcomes, and this model can help assess how well players make in-game decisions in these situations. This type of model could potentially be expanded to estimate in-play probability of winning the game for the team possessing the ball, as opposed to in-play expected points, which would naturally incorporate information about score and time remaining. Skinner and Goldman (2017) highlight this kind of risk/reward tradeoff in basketball.
The other main difference is that while the time window used with EPV is the entire possession, the alternative approach estimates the probabilities that these terminal events will occur within a time window of about 5 s. With this approach, the immediate value of an action can be more easily isolated. A common offensive tactic like a pick-and-roll at the beginning of a possession may have been ineffective, but if the eventual result of the possession is a successful dunk 20 s later, the pick-and-roll could get partial credit under the EPV model for that result unless a shorter time window were used in the model.
In Figure 1, using data provided by the authors, we demonstrate player and ball locations, expected points, and probabilities of terminal events evolving throughout a possession according the models in Sicilia et al. (2019).
Early in the possession, when the ball is being dribbled up the court, the probability that no terminal event will occur in the next 5 seconds is very high, and correspondingly the expected points is very low. As the offense begins to execute passes and drives in the half-court offense, the probabilities of terminal events and expected points begin to move up and down based on the effectiveness of those actions. For example, the drive to the basket early in the possession temporarily increases the probability of a field goal attempt and the expected points, and both decrease again once the drive does not successfully lead to a field goal attempt. See Cervone et al. (2013, 2015) for similar animations from the EPV model.
Some of the recent impetus in developing methods for analyzing player tracking data in football came directly from the NFL. The NFL promoted the use of player-tracking data to analyze the sport by conducting the inaugural Big Data Bowl competition in 2019 (NFL Football Operations, 2019). Contestants used player location data to answer a variety of questions about on-field play. In the 2020 iteration of the competition, contestants tackled a more targeted question: Knowing the location and velocity of the players on the field at the time of a handoff, how many yards do we expect the ball-carrier to gain (NFL Football Operations, 2020)?
The expected yards gained by a ball carrier can be estimated throughout an entire play based on the locations and trajectories of the ball-carrier and the other players on the field. This was the focus of Yurko et al. (2020). After estimating expected yards gained, the authors used a model for EP (Yurko et al., 2019) to convert in-play expected yards gained to in-play expected points. Using in-play expected points and the current game state, the authors developed in-play estimates for winning the entire game.
One difference in their approach, compared to the in-play expected points models for basketball, is that their choice of model yields not only an estimate of expected yards gained, but also of the full probability distribution for yards gained. In other words, their approach determines the range of possible values of yards gained along with the likelihood of their occurrence. This full distribution gives information about the uncertainty in the final outcome. One can estimate quantities such as the probability of a loss of yards, of gaining a first down, of a 20-plus–yard gain, or of a touchdown.
Figure 2 shows a touchdown run by Leonard Fournette, along with the corresponding expected yards gained and the full distribution of yards gained, using data provided by the authors.
The expected yards gained increases throughout the play. The distribution reveals additional information by showing the range of likely outcomes evolving over time. Near the start of the play, the distribution is bimodal, with one mode near the 30-yard line and one near the goal line. This is a wide range of outcomes and indicates that this play is substantially different than a play with a unimodal distribution centered at the 40-yard line, even though it has the same expected end-of-play yard line. By the middle of the play, the distribution becomes unimodal with a single peak at the goal line. This indicates that the most likely outcome is a touchdown, and there is very little chance Fournette is tackled immediately.
This additional information can be helpful when evaluating the risk/reward tradeoff of a play, or decisions made within a play. On defense, a safety’s decision-making and initial trajectory after the snap could be evaluated using expected yards or expected points. A running back could be evaluated based on the comparison of his actual yards gained to the expected yards gained at the time of the handoff. These types of analyses cannot be done using play-by-play data because of the spatiotemporal information needed to provide enough context to better understand a play.
Data have become detailed enough, and analysis sophisticated enough, for expected points to be estimated at any moment of an NBA or NFL game, and that is just the tip of the iceberg. An expected possession value model has been developed for soccer (Fernández et al., 2019), and these models can be extended to other sports as well. The player-tracking data required for these models have recently become available, or will soon be available, for several other leagues in North America, including the National Hockey League (Cotsonika, 2020), college football (Cohen, 2020), men’s and women’s college basketball (Medcalf, 2019), and Major League Soccer (Novy-Williams, 2020). Tracking data have also been analyzed in baseball (Higuchi et al., 2013; Jinji et al., 2011; Nathan, 2003), esports (Maymin, 2018), and individual sports like tennis (Giles et al., 2019; Wei et al., 2013) and golf (Arastey, 2020; Broadie, 2014; Broadie and Shin, 2014).
Player-tracking data can be used to address numerous other problems. For example, in football, these data have been used to analyze fourth-down decisions more precisely (Lopez, 2020), automatically classify receiver routes (Burke, 2019a; Chu et al., 2019), detect defensive pass–coverage formations (Burke, 2019b; Dutta et al., 2020), measure quarterback decision-making (Burke, 2019c), and estimate the completion probability for each receiver throughout a play (Deshpande and Evans, 2019). Even with the greater flexibility of analyzing the dynamics of a game with player-tracking data, care still needs to be taken in drawing conclusions. For example, because tracking data are observational, causal conclusions based on their analysis may not be reliable. Terner and Franks (2020) provide a discussion of limitations using tracking data with a particular emphasis on the need for causal inference techniques in sports analytics.
Other detailed data have become more prevalent and add to the breadth and depth of possible analyses. For example, some leagues allow players to use wearable technology that records biometric information throughout the game that can potentially be used to enhance and improve models by accounting for even finer-level detail. Because of the prevalence of such data, the development of novel models and metrics, and wider acceptance of the benefits of data-based analysis, sports analytics is becoming a more natural part of decision-making processes in sports.
NFL Football Operations (2019). The NFL’s inaugural Big Data Bowl. NFL. https://operations.nfl.com/the-game/big-data-bowl/2019-big-data-bowl/
NFL Football Operations (2020). 2020 Big Data Bowl. NFL. https://operations.nfl.com/the-game/big-data-bowl/
Arastey, G. M. (2020, January 23). The increasing presence of data analytics in golf. Sport Performance Analysis. https://www.sportperformanceanalysis.com/article/increasing-presence-of-data-analytics-in-golf
Broadie, M. (2014). Every shot counts: Using the revolutionary strokes gained approach to improve your golf performance and strategy. Avery.
Broadie, M., & Shin, D. (2014, October). Golf analytics: A random putting model and its applications to optimal targeting strategy and attribution analysis. Wadden Golf Academy. http://waddengolfacademy.com/putting/Broadie%20Shin%20A%20Random%20Putting%20Model.pdf
Burke, B. (2009, December 16). Expected point values. Advanced Football Analytics. http://archive.advancedfootballanalytics.com/2009/12/expected-point-values.html
Burke, B. (2018, October 5). We created better pass-rusher and pass-blocker stats: How they work. ESPN. https://www.espn.com/nfl/story/_/id/24892208/creating-better-nfl-pass-blocking-pass-rushing-stats-analytics-explainer-faq-how-work
Burke, B. (2019a, December 16). [Tweet]. https://twitter.com/bburkeESPN/status/1206751729947152384
Burke, B. (2019b, September 16). [Tweet]. https://twitter.com/bburkeESPN/status/1173775137876992007
Burke, B. (2019c). DeepQB: Deep learning with Player Tracking to quantify quarterback decision-making & performance. 2019 MIT Sloan Sports Analytics Conference, March 1-2, 2019. Boston, MA. [Paper] http://www.sloansportsconference.com/wp-content/uploads/2019/02/DeepQB.pdf
Carter, V., & Machol, R. E. (1971). Technical note—Operations research on football. Operations Research, 19(2), 541–544. https://doi.org/10.1287/opre.19.2.541
Cervone, D., D’Amour, A., Bornn, L., & Goldsberry, K. (2014, February 28–March 1). POINTWISE: Predicting points and valuing decisions in real time with NBA optical tracking data. 2014 MIT Sloan Sports Analytics Conference, February 28-March 1, 2014. Boston, MA. [Paper] http://www.sloansportsconference.com/wp-content/uploads/2018/09/cervone_ssac_2014.pdf
Cervone, D., D’Amour, A., Bornn, L., & Goldsberry, K. (2016, August 18). A multiresolution stochastic process model for predicting basketball possession outcomes. Journal of the American Statistical Association, 111(514), 585–599. https://doi.org/10.1080/01621459.2016.1141685
Cervone, D., D'Amour, A., Bornn, L., & Goldsberry, K. (2013, September 21, 2013). State of Transition: Estimated Real-Time Expected Possession Value in the NBA with a Spatiotemporal Transition Model and Player Tracking Data. [Video]. YouTube. https://www.youtube.com/watch?v=2fYa7M_H3S4
Cervone, D., D'Amour, A., Bornn, L., & Goldsberry, K. (2015). GitHub. https://github.com/dcervone/EPVDemo/tree/master/gifs
Chu, D., Reyers, M., Thomson, J., & Wu, L. Y. (2019). Route identification in the National Football League. Journal of Quantitative Analysis in Sports, 16(2), 121–132. https://doi.org/10.1515/jqas-2019-0047
Clark, N., Macdonald, B., & Kloo, I. (2020, August 3). A Bayesian adjusted plus-minus analysis for the esport Dota 2. Journal of Quantitative Analysis in Sports. Advance online publication. https://doi.org/10.1515/jqas-2019-0103
Cohen, A. (2020, January 9). Sportlogiq and Telemetry Sports team up to track college football players. Sporttechie. https://www.sporttechie.com/sportlogiq-telemetry-sports-partnership-college-football-players-tracking
Cotsonika, N. J. (2020, March 3). Puck, Player Tracking in final testing stage before Stanley Cup Playoffs. NHL. https://www.nhl.com/news/puck-player-tracking-technology-unveiled-during-2020-postseason/c-315806398
Deshpande, S. K., & Evans, K. (2019, 11 16). Expected hypothetical completion probability. Journal of Quantitative Analysis in Sports, 16(2), 85–94. https://doi.org/10.1515/jqas-2019-0050
Dutta, R., Yurko, R., & Ventura, S. L. (2020, 05 29). Unsupervised methods for identifying pass coverage among defensive backs with NFL player tracking data. Journal of Quantitative Analysis in Sports, 16(2), 143–161. https://doi.org/10.1515/jqas-2020-0017
Engelmann, J. (2017). Possession-based player performance analysis in basketball (adjusted+/– and related concepts). In Handbook of statistical methods and analyses in sports (pp. 231–244). Chapman and Hall/CRC. Edited By Jim Albert, Mark E. Glickman, Tim B. Swartz, Ruud H. Koning. https://www.taylorfrancis.com/books/e/9781315166070/chapters/10.1201/9781315166070-16
Fernández, J., Bornn, L., & Cervone, D. (2019). Decomposing the immeasurable sport: A deep learning expected possession value framework. 2019 MIT Sloan Sports Analytics Conference, March 1-2, 2019. Boston, MA. [Paper] http://www.sloansportsconference.com/wp-content/uploads/2019/02/Decomposing-the-Immeasurable-Sport.pdf
Giles, B., Kovalchik, S., & Reid, M. (2019). A machine learning approach for automatic detection and classification of changes of direction from player tracking data in professional tennis. Journal of Sports Sciences, 38(1), 106–113. https://doi.org/10.1080/02640414.2019.1684132
Gramacy, R. B., Jensen, S. T., & Taddy, M. (2013). Estimating player contribution in hockey with regularized logistic regression. Journal of Quantitative Analysis in Sports, 9(1), 97–111. https://doi.org/10.1515/jqas-2012-0001
Higuchi, T., Morohoshi, J., Nagami, T., Nakata, H., & Kanosue, K. (2013). The effect of fastball backspin rate on baseball hitting accuracy. Journal of Applied Biomechanics, 29(3), 279–284. https://doi.org/10.1123/jab.29.3.279
Ilardi, S., & Barzilai, A. (2008). Adjusted plus-minus ratings: New and improved for 2007–2008. 82games. http://www.82games.com/ilardi2.htm
Jinji, T., Sakurai, S., & Hirano, Y. (2011). Factors determining the spin axis of a pitched fastball in baseball. Journal of Sports Science, 29(7), 761–767. https://doi.org/10.1080/02640414.2011.553963
Johnson, N. (2020). Extracting Player Tracking data from video using non-stationary cameras and a combination of computer vision techniques. 2020 MIT Sloan Sports Analytics Conference, March 6-7, 2020. Boston, MA. [Paper] http://www.sloansportsconference.com/content/extracting-player-tracking-data-from-video-using-non-stationary-cameras-and-a-combination-of-computer-vision-techniques/
Katz, S., & Burke, B. (2016, September 8). How is Total QBR calculated? We explain our quarterback rating. ESPN. https://www.espn.com/nfl/story/_/id/6833215/explaining-statistics-total-quarterback-rating
Keshri, S., Oh, M.-h., Zhang, S., & Iyengar, G. (2019). Automatic event detection in basketball using HMM with energy based defensive assignment. Journal of Quantitative Analysis in Sports, 15(2), 141–153. https://doi.org/10.1515/jqas-2017-0126
Lopez, M. (2020). Bigger data, better questions, and a return to fourth down behavior: Anintroduction to a special issue on tracking data in the National football League. Journal of Quantitative Analysis in Sports, 16(2), 73–79. https://doi.org/10.1515/jqas-2020-0057
Macdonald, B. (2011). A regression-based adjusted plus-minus statistic for NHL players. Journal of Quantitative Analysis in Sports, 7(3). https://doi.org/10.2202/1559-0410.1284
Matano, F., Richardson, L. F., Pospisil, T., Eubanks, C., & Qin, J. (2018). Augmenting adjusted plus-minus in soccer with FIFA ratings. ArXiv. https://arxiv.org/abs/1810.08032
Maymin, P. (2018). An open-sourced optical tracking and advanced esports analytics platform for League of Legends. 2018 MIT Sloan Sports Analytics Conference, February 23-24, 2018. [Paper] http://www.sloansportsconference.com/wp-content/uploads/2018/02/1002.pdf
McQueen, A., Wiens, J., & Guttag, J. (2014). Automatically recognizing on-ball screens. 2014 MIT Sloan Sports Analytics Conference, February 28-March 1, 2014. [Paper] http://www.sloansportsconference.com/content/automatically-recognizing-on-ball-screens/
Medcalf, M. (2019, May 15). Mountain West embraces ShotTracker technology. ESPN. https://www.espn.com/mens-college-basketball/story/_/id/26755993/mountain-west-embraces-shottracker-technology
Nathan, A. M. (2003). Characterizing the performance of baseball bats. American Journal of Physics, 71(134). https://doi.org/10.1119/1.1522699
Novy-Williams, E. (2020, 02 26). Major League Soccer to track player movements for TV, gamblers. Bloomberg. https://www.bloomberg.com/news/articles/2020-02-26/major-league-soccer-to-track-player-movements-for-tv-gamblers
Oliver, D. (2011, August 4). Guide to the Total Quarterback Rating. ESPN. https://www.espn.com/nfl/story/_/id/6833215/explaining-statistics-total-quarterback-rating
Rosenbaum, D. T. (2004, April 30). Measuring how NBA players help their teams win. 82games. http://www.82games.com/comm30.htm
Sabin, P. (2020, March 17). Playing catchup: Estimating player positional value in (American) football [Video]. YouTube. https://www.youtube.com/watch?v=oZsFyPGij0U
Schuckers, M., & Curro, J. (2013). Total Hockey Rating (THoR): A comprehensive statistical rating of National Hockey League forwards and defensemen based upon all on-ice events. 2013 MIT Sloan Sports Analytics Conference, March 1-2, 2013. Boston, MA. [Paper] http://www.sloansportsconference.com/wp-content/uploads/2013/Total%20Hockey%20Rating%20(THoR)%20A%20comprehensive%20statistical%20rating%20of%20National%20Hockey%20League%20forwards%20and%20defensemen%20based%20upon%20all%20on-ice%20events.pdf
Sicilia, A., Pelechrinis, K., & Goldsberry, K. (2019). DeepHoops: Evaluating micro-actions in basketball using deep feature representations of spatio-temporal data. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining} (pp. 2096–2104). Association for Computing Machinery. https://doi.org/10.1145/3292500.3330719
Sill, J. (2010). Improved NBA adjusted +/- using regularization and out-of-sample testing. 2010 MIT Sloan Sports Analytics Conference, March 6, 2010. Boston, MA. [Paper] http://www.sloansportsconference.com/wp-content/uploads/2015/09/joeSillSloanSportsPaperWithLogo.pdf
Skinner, B., & Goldman, M. (2017). Optimal strategy in basketball. In J. Albert, M. E. Glickman, T. B. Swartz, & R. H. Koning (Eds.), Handbook of statistical methods and analyses in sports, 229-244. Chapman and Hall/CRC. https://arxiv.org/abs/1512.05652
Terner, Z., & Franks, A. (2020, July 22). Modeling player and team performance in basketball. ArXiv. https://arxiv.org/pdf/2007.10550.pdf
Thomas, A. C., Ventura, S. L., Jensen, S. T., & Ma, S. (2013). Competing process hazard function models for player ratings in ice hockey. Annals of Applied Statistics, 7(3), 1497–1524. https://doi.org/10.1214/13-AOAS646
Wei, X., Lucey, P., Morgan, S., & Sridharan, S. (2013). “Sweet-Spot”: Using spatiotemporal data to discover and predict shots in tennis. 2013 MIT Sloan Sports Analytics Conference, March 1-2, 2013. Boston, MA. [Paper] http://www.sloansportsconference.com/wp-content/uploads/2013/'Sweet-Spot'%20-%20Using%20Spatiotemporal%20Data%20to%20Discover%20and%20Predict%20Shots%20in%20Tennis.pdf
Yurko, R., Matano, F., Richardson, L. F., Granered, N., Pospisil, T., Pelechrinis, K., & Ventura, S. L. (2020). Going deep: Models for continuous-time within-play valuation of game outcomes in American football with tracking data. Journal of Quantitative Analysis in Sports, 16(2), 163–182. https://doi.org/10.1515/jqas-2019-0056
Yurko, R., Ventura, S., & Horowitz, M. (2019). nflWAR: A reproducible method for offensive player evaluation in football. Journal of Quantitative Analysis in Sports, 15(3), 163–183. https://doi.org/10.1515/jqas-2018-0010
This article is © 2020 by the author(s). The editorial is licensed under a Creative Commons Attribution (CC BY 4.0) International license (https://creativecommons.org/licenses/by/4.0/legalcode), except where otherwise indicated with respect to particular material included in the article. The article should be attributed to the authors identified above.