Expected goals in football: Improving model performance and demonstrating value

PLOS ONE, Apr 2023

Recently, football has seen the creation of various novel, ubiquitous metrics used throughout clubs’ analytics departments. These can influence many of their day-to-day operations ranging from financial decisions on player transfers, to evaluation of team performance. At the forefront of this scientific movement is the metric expected goals, a measure which allows analysts to quantify how likely a given shot is to result in a goal however, xG models have not until this point considered using important features, e.g., player/team ability and psychological effects, and is not widely trusted by everyone in the wider football community. This study aims to solve both these issues through the implementation of machine learning techniques by, modelling expected goals values using previously untested features and comparing the predictive ability of traditional statistics against this newly developed metric. Error values from the expected goals models built in this work were shown to be competitive with optimal values from other papers, and some of the features added in this study were revealed to have a significant impact on expected goals model outputs. Secondly, not only was expected goals found to be a superior predictor of a football team’s future success when compared to traditional statistics, but also our results outperformed those collected from an industry leader in the same area.

Expected goals in football: Improving model performance and demonstrating value

PLOS ONE RESEARCH ARTICLE Expected goals in football: Improving model performance and demonstrating value James Mead ID☯, Anthony O’Hare ID☯*, Paul McMenemy ID☯ Computing Science and Mathematics, University of Stirling, Stirling, United Kindom ☯ These authors contributed equally to this work. * a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 OPEN ACCESS Citation: Mead J, O’Hare A, McMenemy P (2023) Expected goals in football: Improving model performance and demonstrating value. PLoS ONE 18(4): e0282295. https://doi.org/10.1371/journal. pone.0282295 Editor: Rabiu Muazu Musa, Universiti Malaysia Terengganu, MALAYSIA Received: April 5, 2022 Accepted: February 11, 2023 Published: April 5, 2023 Abstract Recently, football has seen the creation of various novel, ubiquitous metrics used throughout clubs’ analytics departments. These can influence many of their day-to-day operations ranging from financial decisions on player transfers, to evaluation of team performance. At the forefront of this scientific movement is the metric expected goals, a measure which allows analysts to quantify how likely a given shot is to result in a goal however, xG models have not until this point considered using important features, e.g., player/team ability and psychological effects, and is not widely trusted by everyone in the wider football community. This study aims to solve both these issues through the implementation of machine learning techniques by, modelling expected goals values using previously untested features and comparing the predictive ability of traditional statistics against this newly developed metric. Error values from the expected goals models built in this work were shown to be competitive with optimal values from other papers, and some of the features added in this study were revealed to have a significant impact on expected goals model outputs. Secondly, not only was expected goals found to be a superior predictor of a football team’s future success when compared to traditional statistics, but also our results outperformed those collected from an industry leader in the same area. Copyright: © 2023 Mead et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Data Availability Statement: - The main source of data was from Wyscout, (available https://www. nature.com/articles/s41597-019-0247-7) and can be found at https://figshare.com/collections/ Soccer_match_event_dataset/4415000/5 - Data on player value was taken from a dataset published on Kaggle, which included data scraped from Transfermarkt. The dataset can be found at https:// www.kaggle.com/datasets/kriegsmaschine/soccerplayers-values-and-their-statistics - Data on both match attendance and xG values used to compare models with other sources of xG data were taken from Fbref (https://fbref.com/en/) - Data on Introduction Uncertainty plays a role in all sports and is a key reason why people enjoy interacting with it. The knowledge that luck (alongside performance) can determine who wins and loses is what draws many people in. This factor is arguably most prevalent in football. Due to its low-scoring nature when compared to other sports, uncertainty often highly influences the result of a match [1–5]. This is the ultimate motivation behind novel metrics such as expected goals (commonly shortened to ‘xG’). Put simply, expected goals assigns a probability between 0 and 1 to each shot taken by a team in a game (0 indicating no possibility of the shot being a goal and 1 indicating a definite goal). This is a better way of dealing with the randomness in football than, for example, a traditional goal-based metric since a shot is a much more common event than a goal [4, 5]. Producing a probability value, indicating how likely the shot is to result in a PLOS ONE | https://doi.org/10.1371/journal.pone.0282295 April 5, 2023 1 / 29 PLOS ONE ELO ratings was taken from Clubelo (http://clubelo. com/). Funding: The author(s) received no specific funding for this work. Competing interests: The authors have declared that no competing interests exist. Expected goals in football goal, helps to give analysts an unbiased view of what occurred in the game—more specifically, how many goals both teams ‘should have’ scored given the chances they created. In 2018, FIFA reported that the most recent World Cup tournament in Russia amassed a viewership of 3.572 billion [6]. This figure dwarfs those reached in cricket—widely believed to be the second most popular sport, with audience estimates for the ICC Men’s Cricket World Cup in 2019 standing at 1.6 billion [7]. Naturally, this immense following means there is considerable economic value inherent within football. Therefore, discovering ways in which clubs are able to predict future outcomes with greater confidence and thus gain an advantage, can prove to be extremely financially beneficial. Expected goals provides analysts with this advantage, one which can aid in decision-making at both the sport-level and business-level of football. Not only can it help to improve the fortunes of football clubs on the pitch through tactical analysis of player and team performance, but it can also assist in financial situations such as player acquisition and contract negotiation. This is where xG’s true power lies. Since xG’s inception, the metric has become ubiquitous within football. The majority of top-level football teams and betting companies make use of the statistic (and related concepts of expected assists and post-shot expected goals), with it aiding the development or acquisition of players in clubs and refinement of betting odds modelling for gambling sites [4, 8, 9]. Despite analytics teams at football clubs and statisticians at betting companies championing the idea of expected goals and even incorporating it into the work they do, the concept isn’t so widely regarded by fans and pundits. This paper will also aim to prove the value that expected goals can bring in football analytics, through comparing its predictability of match outcome against traditional methods. It is not clear when the expected goals statistic was first developed and who conceived it, with most [1, 9, 10] stating that Macdonald’s [11] study into shot outcome in ice hockey originated the term, whilst others [3] have attributed it to Green’s [12] article. At its core, the concept of expected goals can be thought of as a classification problem (due to it being a probability of a shot being on target) this is why, in order to calculate these probabilities, machine learning and statistical methods are applied. Different approaches to modelling xG include logistic regression, gradient boosting, neural networks, support vector machines and tree-based classification algorithms [1, 2, 13]. Most of the features incorporated into (...truncated)


This is a preview of a remote PDF: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0282295&type=printable
Article home page: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0282295

James Mead, Anthony O’Hare, Paul McMenemy. Expected goals in football: Improving model performance and demonstrating value, PLOS ONE, 2023, Volume 18, Issue 4, DOI: 10.1371/journal.pone.0282295