Expected goals in football: Improving model performance and demonstrating value
PLOS ONE
RESEARCH ARTICLE
Expected goals in football: Improving model
performance and demonstrating value
James Mead ID☯, Anthony O’Hare ID☯*, Paul McMenemy ID☯
Computing Science and Mathematics, University of Stirling, Stirling, United Kindom
☯ These authors contributed equally to this work.
*
a1111111111
a1111111111
a1111111111
a1111111111
a1111111111
OPEN ACCESS
Citation: Mead J, O’Hare A, McMenemy P (2023)
Expected goals in football: Improving model
performance and demonstrating value. PLoS ONE
18(4): e0282295. https://doi.org/10.1371/journal.
pone.0282295
Editor: Rabiu Muazu Musa, Universiti Malaysia
Terengganu, MALAYSIA
Received: April 5, 2022
Accepted: February 11, 2023
Published: April 5, 2023
Abstract
Recently, football has seen the creation of various novel, ubiquitous metrics used throughout clubs’ analytics departments. These can influence many of their day-to-day operations
ranging from financial decisions on player transfers, to evaluation of team performance. At
the forefront of this scientific movement is the metric expected goals, a measure which
allows analysts to quantify how likely a given shot is to result in a goal however, xG models
have not until this point considered using important features, e.g., player/team ability and
psychological effects, and is not widely trusted by everyone in the wider football community.
This study aims to solve both these issues through the implementation of machine learning
techniques by, modelling expected goals values using previously untested features and
comparing the predictive ability of traditional statistics against this newly developed metric.
Error values from the expected goals models built in this work were shown to be competitive
with optimal values from other papers, and some of the features added in this study were
revealed to have a significant impact on expected goals model outputs. Secondly, not only
was expected goals found to be a superior predictor of a football team’s future success
when compared to traditional statistics, but also our results outperformed those collected
from an industry leader in the same area.
Copyright: © 2023 Mead et al. This is an open
access article distributed under the terms of the
Creative Commons Attribution License, which
permits unrestricted use, distribution, and
reproduction in any medium, provided the original
author and source are credited.
Data Availability Statement: - The main source of
data was from Wyscout, (available https://www.
nature.com/articles/s41597-019-0247-7) and can
be found at https://figshare.com/collections/
Soccer_match_event_dataset/4415000/5 - Data on
player value was taken from a dataset published on
Kaggle, which included data scraped from
Transfermarkt. The dataset can be found at https://
www.kaggle.com/datasets/kriegsmaschine/soccerplayers-values-and-their-statistics - Data on both
match attendance and xG values used to compare
models with other sources of xG data were taken
from Fbref (https://fbref.com/en/) - Data on
Introduction
Uncertainty plays a role in all sports and is a key reason why people enjoy interacting with it.
The knowledge that luck (alongside performance) can determine who wins and loses is what
draws many people in. This factor is arguably most prevalent in football. Due to its low-scoring
nature when compared to other sports, uncertainty often highly influences the result of a
match [1–5]. This is the ultimate motivation behind novel metrics such as expected goals
(commonly shortened to ‘xG’). Put simply, expected goals assigns a probability between 0 and
1 to each shot taken by a team in a game (0 indicating no possibility of the shot being a goal
and 1 indicating a definite goal). This is a better way of dealing with the randomness in football
than, for example, a traditional goal-based metric since a shot is a much more common event
than a goal [4, 5]. Producing a probability value, indicating how likely the shot is to result in a
PLOS ONE | https://doi.org/10.1371/journal.pone.0282295 April 5, 2023
1 / 29
PLOS ONE
ELO ratings was taken from Clubelo (http://clubelo.
com/).
Funding: The author(s) received no specific
funding for this work.
Competing interests: The authors have declared
that no competing interests exist.
Expected goals in football
goal, helps to give analysts an unbiased view of what occurred in the game—more specifically,
how many goals both teams ‘should have’ scored given the chances they created.
In 2018, FIFA reported that the most recent World Cup tournament in Russia amassed a
viewership of 3.572 billion [6]. This figure dwarfs those reached in cricket—widely believed to
be the second most popular sport, with audience estimates for the ICC Men’s Cricket World
Cup in 2019 standing at 1.6 billion [7]. Naturally, this immense following means there is considerable economic value inherent within football. Therefore, discovering ways in which clubs
are able to predict future outcomes with greater confidence and thus gain an advantage, can
prove to be extremely financially beneficial. Expected goals provides analysts with this advantage, one which can aid in decision-making at both the sport-level and business-level of football. Not only can it help to improve the fortunes of football clubs on the pitch through tactical
analysis of player and team performance, but it can also assist in financial situations such as
player acquisition and contract negotiation. This is where xG’s true power lies.
Since xG’s inception, the metric has become ubiquitous within football. The majority of
top-level football teams and betting companies make use of the statistic (and related concepts
of expected assists and post-shot expected goals), with it aiding the development or acquisition
of players in clubs and refinement of betting odds modelling for gambling sites [4, 8, 9].
Despite analytics teams at football clubs and statisticians at betting companies championing
the idea of expected goals and even incorporating it into the work they do, the concept isn’t so
widely regarded by fans and pundits. This paper will also aim to prove the value that expected
goals can bring in football analytics, through comparing its predictability of match outcome
against traditional methods.
It is not clear when the expected goals statistic was first developed and who conceived it,
with most [1, 9, 10] stating that Macdonald’s [11] study into shot outcome in ice hockey originated the term, whilst others [3] have attributed it to Green’s [12] article. At its core, the concept of expected goals can be thought of as a classification problem (due to it being a
probability of a shot being on target) this is why, in order to calculate these probabilities,
machine learning and statistical methods are applied. Different approaches to modelling xG
include logistic regression, gradient boosting, neural networks, support vector machines and
tree-based classification algorithms [1, 2, 13]. Most of the features incorporated into (...truncated)