The Anatomy of American Football: Evidence from 7 Years of NFL Game Data (pdf)

Article PDF cannot be displayed. You can download it here:

https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0168716&type=printable

The Anatomy of American Football: Evidence from 7 Years of NFL Game Data

RESEARCH ARTICLE The Anatomy of American Football: Evidence from 7 Years of NFL Game Data Konstantinos Pelechrinis1*, Evangelos Papalexakis2 1 School of Information Sciences, University of Pittsburgh, Pittsburgh, PA, United States of America, 2 Department of Computer Science and Engineering, University of California Riverside, Riverside, CA, United States of America * Abstract a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 OPEN ACCESS Citation: Pelechrinis K, Papalexakis E (2016) The Anatomy of American Football: Evidence from 7 Years of NFL Game Data. PLoS ONE 11(12): e0168716. doi:10.1371/journal.pone.0168716 Editor: Kimmo Eriksson, Mälardalen University, SWEDEN Received: July 23, 2016 Accepted: November 23, 2016 Published: December 22, 2016 Copyright: © 2016 Pelechrinis, Papalexakis. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Data Availability Statement: All relevant data are available within the manuscript and deposited in Github: https://github.com/kpelechrinis/ footballonomics. Funding: The author(s) received no specific funding for this work. Competing Interests: The authors have declared that no competing interests exist. How much does a fumble affect the probability of winning an American football game? How balanced should your offense be in order to increase the probability of winning by 10%? These are questions for which the coaching staff of National Football League teams have a clear qualitative answer. Turnovers are costly; turn the ball over several times and you will certainly lose. Nevertheless, what does “several” mean? How “certain” is certainly? In this study, we collected play-by-play data from the past 7 NFL seasons, i.e., 2009–2015, and we build a descriptive model for the probability of winning a game. Despite the fact that our model incorporates simple box score statistics, such as total offensive yards, number of turnovers etc., its overall cross-validation accuracy is 84%. Furthermore, we combine this descriptive model with a statistical bootstrap module to build FPM (short for Football Prediction Matchup) for predicting future match-ups. The contribution of FPM is pertinent to its simplicity and transparency, which however does not sacrifice the system’s performance. In particular, our evaluations indicate that our prediction engine performs on par with the current state-of-the-art systems (e.g., ESPN’s FPI and Microsoft’s Cortana). The latter are typically proprietary but based on their components described publicly they are significantly more complicated than FPM. Moreover, their proprietary nature does not allow for a head-tohead comparison in terms of the core elements of the systems but it should be evident that the features incorporated in FPM are able to capture a large percentage of the observed variance in NFL games. 1 Introduction While American football is viewed mainly as a physical game—and it surely is—at the same time it is probably one of the most strategic sports games, a fact that makes it appealing even to an international crowd [1]. This has led to people analyzing the game with the use of data analytics methods and game theory. For instance, after the controversial last play call of Super Bowl XLIX the Economist [2] argued by utilizing appropriate data and game theory that this play was rational and not that bad after all. PLOS ONE | DOI:10.1371/journal.pone.0168716 December 22, 2016 1 / 17 The Anatomy of American Football The ability to analyze and collect large volumes of data has put forward a quantificationbased approach in modeling and analyzing the success in various sports during the last few years. For example, pertinent to American football, Clark et al. [3] analyzed the factors that affect the success of a field goal kick and contrary to popular belief they did not identify any situational factor (e.g., regular vs post season, home vs away etc.) as being significant. In another direction Pfitzner et al. [4] and Warner [5] studied models and systems for determining a successful betting strategy for NFL games, while the authors in [6] show that the much-discussed off-field misconduct of NFL players does not affect a team’s performance. Furthermore, the spatial information collected from the RFID sensors on NFL players has been used to evaluate quarterbacks’ decision making ability [7], while efforts to assess the impact of individual offensive linemen on passing have been presented by Alamar and Weinstein-Gould [8]. Similarly, Correia et al. [9] analyzed the passing behavior of rugby players—the most similar sport to that of American football. They found that the time required to close the gap between the first attacker and the defense explained 64% of the variance found in pass duration and this can further yield information about future pass possibilities. Nevertheless, despite the availability of play data for American football and the proliferation of the sports analytics literature as well as the literature surrounding the NFL, there are only few—publicly open—studies that have focused on predicting a game’s outcome. Furthermore, some of the existing models make strong theoretical assumptions that are hard to verify (e.g., the team strength factors obeying to a first-order autoregressive process [10]). Close with our work, Cohea and Payton developed a logistic regression model to understand the factors affecting an NFL game outcome [11]. The benefit of our model as compared to the one presented by Cohea and Payton [11] is that the number of exploratory variables we are using is much smaller, making it easy for a fan to follow. Most importantly though we combine our model with statistical bootstrap in order to facilitate future game predictions (something that the model presented in [11] is not able to perform). Of course, predictive models for NFL games have been developed by major sports networks. For example ESPN has developed the Football Power Index, which is used to make probabilistic predictions for upcoming matchups [12]. Software companies have also developed their own models (e.g., Cortana from Microsoft [13]). Nevertheless, these models are proprietary and are not open to the public. In this study we are first interested in providing a simple model that is able to quantify the impact of various factors on the probability of wining a game of American football. How much does a turnover affect a team’s probability of winning? Can you really win a game after having turned the ball over 5 times? While coaches and players know the qualitative answer to similar questions, the goal of our work is to provide a quantitative answer. For this purpose we use play-by-play data for the last seven seasons of the National Football League (i.e., between 2009 and 2015) and we extract specific team stat (...truncated)