Keywords

1 Introduction

When thinking about football (soccer) leagues, one of the first things that comes to mind is the league table. The position of a team in the league table determines the fate of managers, the happiness of fans and how much money the club can spend in the next season. Since the league table must determine which teams win the league, which qualify for European competitions and which are relegated, it contains a lot of tie breakers so that each team is assigned a unique rank.

However, if we wish to assess the performances of teams from a data analytics point of view, the league table has multiples downsides: First, we would like to evaluate the number of substantially different levels of playing quality in the league. The league table offers as many levels as there are teams in the league, but one would expect a much smaller number of quality levels (for example, the top clubs who compete for the trophy, the clubs fighting against relegation and the ones somewhere in between). Second, we would like to understand against which kind of opposition teams perform well. For instance, some teams might earn a lot of points by very consistently beating bad teams but at the same time losing most games against the very good teams. The league table would cover this up since the number of points awarded against good/bad opposition are exactly the same.

1.1 Our Contributions

In this paper, we take a step towards resolving the aforementioned issues. That is, we show how to find a small number of interpretable levels of playing quality in the top 4 major European leagues. We also point out how some of our results indicate that certain teams perform well (or badly) against very good teams.

To obtain our results, we define graphs based on the match results in the football leagues we consider. We then apply algorithms for finding hierarchies in graphs. These algorithms will provide us with a ranking of the teams into a small number of levels which are highly interpretable.

In Sect. 3 we briefly discuss for how many matchdays such rankings can be obtained without any conflicting results. For example, a conflict refers to results with team A beating team B, team B beating team C and team C beating team A. For such conflicting results it is inherently difficult to find good rankings. We will formalize this in Sect. 3.

We address this issue with conflicts by introducing a generalized version of the agony problem [7] in Sect. 4.

To be a bit more specific, we will be looking at two types of graphs: (1) Directed graphs, where the vertices are the teams of the league and where we insert an edge from team A to team B if A beats B, and (2) undirected graphs, where we insert edges if A and B played a draw. The graph algorithms will then partition the vertices into levels such that (1) most directed edges point from “good” teams (at high levels) towards “bad” teams (at lower levels), and (2) most undirected edges are between clubs on the same level.

Section 5 presents our results. We provide hierarchies of the teams in four major European football leagues in the 2017/2018 season.

2 Related Work

Ranking Football Teams. Several approaches for assessing the quality of football teams have been proposed.

Hvattum and Arntzen [11] and Leitner et al. [13] considered a modification of the ELO rating system. ELO was initially developed for chess [5] and [11, 13] modified it for usage in the football domain. Some implementations of ELO ratings are also available online [1, 2].

Constantinou and Fenton [4] introduced pi-ratings and they demonstrated that pi-ratings perform better than the ELO rankings of [11, 13] when predicting the outcomes of future matches. Pi-ratings have also been used in state of the art methods for predicting the outcomes of football matches [3, 10].

ELO and pi-ratings assign quality scores to teams and these scores can be used to assess the strength of the teams. However, unlike the results presented in this paper, these scores are continuous-valued and do not group the teams into a small number of substantially different playing levels.

Hierarchies in Graphs. Finding hierarchies in directed graphs is an old and fundamental problem. In this line of work, one is given a directed graph \(G = (V,E)\) and one wants to assign levels to the vertices in V such that certain properties are satisfied. A drawing of the graph respecting these levels is called a hierarchy.

For example, if G is a directed acyclic graph (DAG), we can draw G such that all of its edges point downwards. Then the drawing immediately provides the levels of the vertices in V. However, such an approach fails as soon as G contains a cycle. See Sect. 3 for more details.

One classic way to obtain hierarchies from graphs which contain cycles is to solve the feedback arc set problem. This problem asks to find a subgraph \(H = (V,E')\) of G such that H is a DAG and the number of non-DAG edges \(E \setminus E'\) is minimized. Unfortunately, this problem is NP-hard to solve [12] and known polynomial time algorithms only achieve poly-logarithmic approximations [6]. Hanauer [8, 9] conducted an experimental study on algorithms for the feedback arc set problem.

Gupte et al. [7] introduced agony which can be viewed as a tractable version of the feedback arc set problem. Agony does not minimize the number of non-DAG edges (i.e., the edges pointing upwards) but it weights each of these non-DAG edges by the number levels it points upwards; we provide a technical definition of agony in Sect. 4. [7] showed that agony can be used to identify hierarchies in social networks and to find hierarchies in American football leagues; they also provided a polynomial time algorithm. Later, Tatti [15,16,17] provided faster and more general algorithms for computing agony.

However, neither the feedback arc set problem nor agony as studied in [7, 15,16,17] allow for finding hierarchies in which certain vertices in the graph are supposed to be on the same level. Thus, while these algorithms can be used for sport leagues without draws (such as American football or basketball), they do not apply to sports like football where draws frequently occur.

3 A Partial Order for a Partial Season

In this section, we discuss for how many matchdays we can rank the teams of a football league based on the results of their games without contradicting results.

To start out, let us first formalize what we mean by contradicting results. Consider the set of teams T and define a binary relation > on the set of teams T as follows. For teams \(A,B \in T\), let \(A > B\) if A won against B. Now, we have contradicting results if there are teams \(A_1, \dots , A_k\) such that \(A_1> A_2> \dots> A_k > A_1\). For example, such a contradiction occurs if Chelsea beat Liverpool, who in turn beat Manchester United, who in turn beat Chelsea.

Fig. 1.
figure 1

The first seven matchdays of the German Bundesliga season 2015/16.

Note that if the results contain a contradiction, we cannot rank the teams in the league properly. In the previous example, it is not clear how we should rank Chelsea, Liverpool and Manchester United because either of them could be placed on level 1.

Hence, for the rest of this section, let us only consider the first \(\ell \) matchdays of a league where the results contain no contradicting results. In such a case the result graph G is a directed acyclic graph (DAG). More formally, \(G = (T,E)\) is the graph with the teams in T as vertices and containing an edge from team A to team B if \(A > B\).

If G is a DAG, then it is quite simple to obtain a ranking and a drawing for it. We can place the teams without incoming edges at level 1 (i.e., those teams which did not lose). Next, at level 2, we place all vertices which only have incoming edges from vertices already placed. Then we continue in this fashion for all levels \(i \ge 3\). Note that all edges must be pointing downwards.

To obtain interesting rankings for a given league, we consider the largest number L such that the results of the first L match days have no contradicting results. Figure 1 presents the result graph G for the German Bundesliga season 2015/16. We picked this season for the reason that it had \(L=7\), yielding an elaborate diagram of the first seven matchdays. Subsequent Bundesliga seasons each had \(L=4\).

Unfortunately, contradicting results are quite common in football leagues or tournaments. Skinner and Freeman [14, Sect. 3] study this problem for the FIFA World Cup results from 1938–2006 and in case of random game outcomes.

4 Agony for Sport Leagues with Draws

In this section, we show how we can obtain meaningful hierarchies even in the presence of contradicting results as per Sect. 3. To do this, we will consider a generalized version of agony which we will be using throughout the rest of the paper. Agony was first studied by Gupte et al. [7] (see Sect. 2).

Definition

Let \(G = (V, E_{\text {dir}}, w)\) be a directed graph with edge weights w(uv) and let \(H = (V,E_{\text {undir}})\) be an undirected unweighted graph. Both graphs are defined over the same set of vertices. The ranking that we will compute is related to G and H in the following way. Edges from u to v in the directed graph G indicate that u should be above v (this corresponds to team u winning against team v). The edges in the undirected graph H, on the other hand, indicate that u and v should be on the same level in the hierarchy (this corresponds to teams u and v drawing). We will now make this more formal.

A ranking r is a function \(r : V \rightarrow \mathbb {N}\) which maps a vertex v to a level r(v). In the following, when we draw the vertices in V according to some ranking function r, we will start with vertices in level 1 at the top, followed by the vertices on level 2 below that, and so on. Thus, an edge from u at level 5 to a node v on level 2 is pointing upwards; vice versa, an edge from u at level 1 to v at level 9 is pointing downwards.

Now consider a fixed ranking function r. We will define two different quantities: First, the agony caused by r on G and, second, the agony of r on H. Intuitively, edges pointing upwards w.r.t. r will cause agony and we will want to find a ranking function which has few of such edges.

The agony of r on G is defined as

$$\begin{aligned} A_{\text {dir}}(G,r) = \sum _{(u,v) \in E_{\text {dir}}} \max \{ r(u) - r(v) + 1, 0 \} \cdot w(u,v). \end{aligned}$$
(1)

To develop a better understanding of the sum in Eq. 1, consider an edge directed from u to v and suppose that all weights w(uv) are 1. Now, if u has a higher rank than v (e.g., \(r(u) = 5\) and \(r(v) = 1\)), then the edge is pointing upwards in the hierarchy. Intuitively, this should introduce some agony and, indeed, we have \(r(u) - r(v) + 1 > 0\). On the other hand, if u has a smaller rank than v (e.g., \(r(u) = 1\), \(r(v) = 2\)), the quantity \(r(u) - r(v) + 1\) is non-positive and no agony will be caused by this edge. This aligns well with our intuition since the edge is pointing downwards in the drawing of the vertices. Finally, if two vertices are on the same level, their agony will be exactly 1 due to the \(+1\) term. Note that in Eq. 1 we also multiply this level difference with the corresponding edge weight.

We note that \(A_{\text {dir}}(G,r)\) is exactly the definition of agony used in [7] and generalized to weighted graphs.

The agony of r on H is given by

$$\begin{aligned} A_{\text {undir}}(H,r) = \sum _{(u,v) \in E_{\text {undir}}} | r(u) - r(v) |. \end{aligned}$$
(2)

Note that \(A_{\text {undir}}(G,r)\) does not introduce any agony for edges (uv) which have both endpoints on the same level (because then \(|r(u) - r(v)| = 0\)). However, for edges (uv) with endpoints on different levels, it will count the level difference.

Now our goal is to find a ranking function \(r^*\) which minimizes the sum of the agony on both graphs, i.e., we look for the minimizer of

$$\begin{aligned} A(G,H) = \min _{r \in V^{\mathbb {N}}} A_{\text {dir}}(G,r) + A_{\text {undir}}(H,r). \end{aligned}$$
(3)

We note that when setting the undirected graph H to an empty graph (i.e., \(E_{\text {undir}}= \emptyset \)), then Eq. 3 becomes exactly the same as in [7].

Integer Linear Program (ILP). Based on the definitions of agony, we now state the ILP that we solve to compute the agony A(GH). Solving the ILP also provides the ranking function \(r^*\) which is minimizing the agony.

The ILP is stated in Eq. 4 and it is using the following variables: r(u) is the level assigned to vertex u; x(uv) is the agony caused by the directed edge \((u,v) \in E_{\text {dir}}\); y(uv) corresponds to the agony caused by the undirected edge \((u,v) \in E_{\text {undir}}\).

(4)

We now show that the ILP in Eq. 4 correctly computes the agony.

Lemma 1

The ILP stated in Eq. 4 correctly computes the agony as defined in Eq. 3.

Proof

(Sketch). Let r(u), x(uv), y(uv) be a solution for the ILP.

First, we show that \(x(u,v) = \max \{ r(u) - r(v) + 1, 0 \} \cdot w(u,v)\) for each \((u,v) \in E_{\text {dir}}\). Indeed, the constraints imply that \(x(u,v) \ge 0\) and \(x(u,v) \ge (r(u) - r(v) + 1) \cdot w(u,v)\). Since x(uv) is not present in any other constraint and since we are minimizing the objective function, we obtain the desired equality. (The same constraints were also derived in [7].)

Second, we show that \(y(u,v) = |r(u) - r(v)|\) for each edge \((u,v) \in E_{\text {undir}}\). Note that either both \(r(u) - r(v)\) and \(r(v) - r(u)\) are zero or exactly one of them is positive. This implies that y(uv) is non-negative and also that \(y(u,v) \ge |r(u) - r(v)|\). As before, the equality follows since y(uv) does not occur in any other constraint and because we are minimizing in the objective function.

It follows that the ILP and agony compute exactly the same.    \(\square \)

4.1 Agony for Football Leagues

Next we discuss how an algorithm for the agony problem from the previous subsection can be used to create the ranking for a football league.

Based on the results of the football league, we create two graphs G and H and each of these graphs has one vertex for each team. The graph G is directed with edge weights w(uv) and the graph H is undirected and unweighted.

First, assume that all teams play each other exactly once. Suppose team A beats team B with a goal difference of z. For example, Borussia Dortmund beats Schalke with 5:0. In this case, we insert an edge from team A to team B with weight \(z = 5\) into the directed graph G. If team A and team B play draw, we insert an undirected edge into the undirected graph H. Thus, G will correspond to games with a winner and H will correspond to games that ended in draws.

Now, to handle multiple games between the same teams, we just add up the goal differences of all of their games. For example, if Borussia Dortmund and Schalke play three games ending 5:1, 0:2, and 4:4, then the goal difference on the edge will be 2. I.e., we will consider the “direct comparison” of the teams.

5 Evaluation

We implemented our algorithm in Python, and used the PuLP library for solving the ILP. The ILP can be solved in a few seconds because the problem instances are small.

To obtain better and more robust rankings, we set the weights in the directed graph G slightly differently than discussed above. If the goal difference of teams u and v is d, then we set the weight of the edge (uv) to \(w(u,v) = 1 + \log (d)\), where the logarithm has base 2.

There were three reasons for this choice of w(uv): (1) When we ran the experiments with weight function \(w(u,v) = d\), the ranking was not very robust. For example, if a team lost only a single game with a relatively large goal difference this often lead to the team loosing one level in the hierarchy. Hence, a single bad performance could affect the ranking of a team excessively much. (2) The choice of \(w(u,v) = 1 + \log (d)\) has several nice properties. For narrow wins/losses with goal difference \(d = 1, 2\), \(w(u,v) = d\). Thus, for narrow games the choice of w(uv) has no influence on the ranking. For higher wins/losses, however, \(d \ge 3\) and \(w(u,v) < d\); thus, larger losses are penalized less and the rankings become more robst than discussed in point (1). (3) We also tried out different weight functions such as \(w(u,v) = d^\beta \) for different parameters \(\beta \in (0,5]\) but the function we used gave the most natural rankings.

5.1 Case Study on European Football Leagues

We consider four major European football leagues, namely Premier League (England), La Liga (Spain), Serie A (Italy), and Bundesliga (Germany) for evaluating the ILP. To this end, we used the data from www.football-data.co.uk. The dataset for a football league contains statistics such as match stats, full time results, and total goals from all the games played in that league.

Fig. 2.
figure 2

Results on (top) Bundesliga, and (bottom) English Premier League. Accompanying the results on the right hand side are the corresponding league tables for the year 2017/2018.

The results of the ILP on all four leagues are shown in Figs. 2 and 3. For comparison, we also provide the league tables for the season 2017/2018 alongside the resulting hierarchy produced by the ILP.

Bundesliga. In Bundesliga, the ILP discovers four levels. Bayern Munich is the only team in the top level (LEVEL 1). This is also corroborated by the massive points difference (Pts) as well as the goal difference (GD) between Bayern Munich and the second team in Bundesliga. The teams from the second till the sixth position in the league table are in the second level (LEVEL 2). Given that they have similar statistics (the number of wins, draws, losses, and the goal difference), the result of ILP is in agreement with our intuition. LEVEL 4 consists of the two bottom-last teams Koln and Hamburg, as well as Freiburg (rank 15 in the table). Freiburg is in the fourth level since they lost the direct comparison against five teams on LEVEL 3 (some of them with large margins). The remaining midfield teams are in the third level (LEVEL 3).

English Premier League (EPL). The ILP yields an interesting result on the EPL. Although second-place Manchester United has much less points than the premier league winner Manchester City, they are both in the first level. This may seem counter-intuitive at first. There’s more to it, however. Manchester United lost the home game, but won the away game against Manchester City. In addition to that, the results of Manchester United are very similar to those of Manchester City against the teams on LEVEL 2 (4W, 1D, 1L of United compared to 5W, 0D, 1L of City).

Interestingly, even though Chelsea is fifth in the league table, it is placed in LEVEL 3. Although it might seem odd at a first glance, it makes sense at a closer look. The goal differences in direct comparison of Chelsea against the teams in the top three levels are:

  • (Level 1) Manchester City: \(-2\), Manchester United: 0

  • (Level 2) Liverpool: \(+1\), Arsenal: 0, Tottenham: \(-1\)

  • (Level 3) Leicester: \(+1\), Newcastle: \(-1\), West Ham: \(-1\), Burnley: 0, Bournemouth: \(-2\), Crystal Palace: 0, Watford: \(-1\), Everton: \(+2\)

In the direct comparison, Chelsea did not win against a single team on LEVEL 1, only one team on LEVEL 2, and only two (out of eight) teams on LEVEL 3. This also reflects their inconsistency throughout last season. As for the teams in LEVEL 4, they are also in the bottom of the league table, and share similar overall statistics.

Fig. 3.
figure 3

Results on (top) Serie A, and (bottom) La Liga. Accompanying the results on the right hand side are the corresponding league tables for the year 2017/2018.

Serie A. As in the previous two leagues, the ILP discovers four levels in Serie A. Although Roma is above Inter in the league table, only Inter makes it to the top level. The goal differences in direct comparison of Inter against the teams in the top two levels are as follows:

  • (Level 1) Juventus: \(-1\), Napoli: 0

  • (Level 2) Roma: \(+2\), Lazio: \(+1\), Milan: \(+1\), Atalanta: \(+2\), Fiorentina: \(+2\), Torino \(-1\), Sampdoria: \(+6\)

This clearly suggests that Inter had a very good record against the teams in the top two levels (except teams from Turin), and hence is placed in the top level.

La Liga. Contrary to the other leagues, the ILP discovers five levels in La Liga. Barcelona won La Liga last season with only one loss, and they were 14 points ahead of second-placed Atletico Madrid. It is sensible that they are the only team in LEVEL 1. Although Atletico Madrid is above Real Madrid in the league table (with mere 3 points difference), Real Madrid is the only team on LEVEL 2. If we look at last season’s La Liga matches, we find that although Real Madrid has only marginally better performance against the top three teams in the league table compared to Atletico Madrid, Real Madrid did very well against the rest of the teams. The inclusion of Getafe on LEVEL 3 is due to its better head-to-head performances facing the teams on the same level.

6 Conclusion

We generalised hierarchies in weighted networks to sport leagues with draws. As a proof of concept, we considered four major European football leagues. The results show that the proposed algorithm yields a natural ranking of the teams in a football league into a small number of interpretable quality levels. Future work includes applying our generalised formulation of agony to other application areas, such as ranking the products of a retail store, and developing an efficient algorithms to discover hierarchies in those domains.