Experiments with Kemeny ranking: What works when?
Highlights
► We compare 104 algorithms for Kemeny rank aggregation across several datasets. ► We find that the problem instances span 3 regimes: strong, weak, and no consensus. ► We make specific algorithmic recommendations for each regime. ► We also propose a computationally fast test to distinguish between the 3 regimes. ► The integer program, local search, and branch and bound algorithms perform well.
Introduction
Preference aggregation has been extensively studied by economists under social choice theory. Arrow discussed certain desirable properties that a ranking rule must have and showed that no rule can simultaneously satisfy them all (Arrow, 1963). Thus, a variety of models of preference aggregation have been proposed, each of which satisfy subsets of properties deemed desirable. In theoretical computer science, too, many applications of preference aggregation exist, including merging the results of various search engines (Cohen et al., 1998, Dwork et al., 2001), collaborative filtering (Pennock et al., 2000), and multiagent planning (Ephrati and Rosenschein, 1993).
The Kemeny ranking rule (Kemeny and Snell, 1962) is widely used in both of these areas. From Arrow’s Axioms’ point of view, the Kemeny ranking is the unique rule satisfying the independence of irrelevant alternatives and the reinforcement axiom (Young and Levenglick, 1978). More recently, it has been found to have a natural interpretation as the maximum likelihood ranking under a very simple noise model proposed by Condorcet (Young, 1995). The same noise model was proposed independently in statistics by Mallows (1957) and refined by Fligner and Verducci (1986), under the name Mallows’ model. Further extensions that go beyond the scope of this paper but that are relevant to modeling preferences have been proposed by Mao and Lebanon (2008) and Meilă and Bao (2010).
Finding the Kemeny ranking is unfortunately a computational challenge, since the problem is NP-hard even for only four votes (Bartholdi et al., 1989, Cohen et al., 1998). Since the problem is important across a variety of fields, many researchers across these fields have converged on finding good, practical algorithms for its solution. There are formulations of the problem that lead to exact algorithms, of course without polynomial running time guarantees, and we present two of these in Section 3.1. There are also a large number of heuristic and approximate algorithms, and we enumerate several classes of these algorithms in Section 3.2.
Surveying the various approaches is only the first step, since we are interested in finding which algorithms (or combinations of algorithms) are likely to perform well in practice. For this, we perform a systematic comparison, running all algorithms on a batch of real-world and synthetic Kemeny ranking tasks.
Since we are attacking an intractable task, what we can examine is the trade-off between computational effort and quality of the obtained solution. We will examine which algorithms, if any, are systematically achieving optimal trade-offs. We hope thereby to be able to recommend reliable methods to practitioners.
Such an analysis was recently undertaken by Schalekampf and van Zuylen (2009), forthwith abbreviated as SvZ, with whose work several parallels exist. While from the point of view of the experimental setup our work has little to add, we differ fundamentally in the conclusions we draw. That is because we factor in the difficulty of the actual problem, something that SvZ do not consider. Since both the performance and running time of an algorithm can change with the difficulty of the problem, we expect that the “best” algorithm to use will differ depending on the operating regime.
Hence, we propose several natural measures of difficulty for the Kemeny ranking problem, and re-examine the algorithms’ performance results from this perspective.
The rest of the paper is structured as follows. The next section formally defines the Kemeny ranking problem and introduces the relevant notation. Sections 3.1 Exact algorithms, 3.2 Approximate algorithms review the algorithms, while Section 4 presents the datasets we used for experimental evaluation and how they were obtained. The experimental results follow in Section 5; this section already gives some insights into what algorithms are promising, and into the role of the problem difficulty in shaping the performance landscape. We further examine these findings in Section 6. Section 7 discusses the different regimes of difficulty of the Kemeny ranking problem and proposes a simple heuristic to distinguish between them in practice. The paper concludes with Section 8.
Section snippets
The Kemeny ranking problem
Given a profile of rankings1 over alternatives, the Kemeny ranking problem is the problem of finding the ranking In the above, represents the Kendall distance between two permutations defined as the minimum number of adjacent transpositions needed to turn into . The ranking is the ranking that minimizes the total number of disagreements with the
Algorithms
We aimed for an ensemble of algorithms that would allow for various trade-offs between performance and accuracy. We first discuss the algorithms we experimented with that return an exact solution to the Kemeny ranking problem. Then, we discuss algorithms that return an approximate solution.
Datasets
In this section, we discuss the datasets that we used for our experiments. In collecting these datasets, we aimed for a variety of lengths (), number of rankings (), and degrees of consensus. To this end, we used real-world and synthetic datasets. We first describe the real-world and then move on to the synthetic datasets.
Methodology
We ran each algorithm described above on each data set (i.e. 104 algorithms 49 data sets) and recorded the running time, resulting ranking, and other variables.
As a measure of algorithm accuracy we use the number of pairwise disagreements between the algorithm’s output ranking and all the data, i.e. The above is nothing else than the rescaled r.h.s. of Eq. (1), so it reflects how well the algorithm has optimized the desired objective. We call this objective the cost of
The Pareto curves and general observations
Across all experiments described here we observe the following. First, no algorithm reaches both the lowest cost and lowest runtime simultaneously over all experiments. This is why in our experimental results, we traced the Pareto boundary, i.e. the set of algorithms that cannot be bettered in both cost and running time by another algorithm.
The shape of the Pareto boundary tell us about the possible trade-offs between run time and accuracy. It is easy to observe that in practically all
Regimes of difficulty: strong, weak, and no consensus
In the tasks above, we presented the algorithms with difficulties ranging from hard to easy. The reader has probably noted that the success of the algorithms (and implicitly the difficulty of the problem) did not depend on alone, but also on the data distribution. In particular, the concentration of the data around the Kemeny ranking had a strong influence on the algorithms’ ability to optimize the cost, and for some algorithms (like B&B) it also influenced the running time. We now discuss
Conclusions
We have performed an extensive comparison of algorithms for the Kemeny rank aggregation problem, originating in social choice theory, machine learning, and theoretical computer science. The problem being NP-hard, the focus has been on establishing the best trade-offs between search time and performance.
Acknowledgments
We thank the anonymous reviewers for the constructive feedback that significantly improved this paper. MM acknowledges partial support from NSF award IIS-0535100.
References (31)
- et al.
Fixed-parameter algorithms for Kemeny rankings
Theoretical Computer Science
(2009) - et al.
Aggregating inconsistent information: ranking and clustering
Social Choice and Individual Values
(1963)- et al.
Voting schemes for which it can be difficult to tell who won
Social Choice and Welfare
(1989) - et al.
Partial kernelization for rank aggregation: theory and experiments
Memoire sur les Elections au Scrutin
(1781)- et al.
A new heuristic algorithm solving the linear ordering problem
Computational Optimization and Applications
(1996) - et al.
Learning to order things
Journal of Artificial Intelligence Research
(1998) - Coleman, T., Wirth, A., 2008. Ranking tournaments: local search and a new algorithm. In: Proceedings of the Workshop on...
- Conitzer, V., Davenport, A., Kalagnanam, J., 2006. Improved bounds for computing Kemeny rankings. In: Proceedings of...
Spearman’s footrule as a measure of disarray
Journal of the Royal Statistical Society. Series B
An Introduction to the Bootstrap
Cited by (143)
A unifying rank aggregation framework to suitably and efficiently aggregate any kind of rankings
2023, International Journal of Approximate ReasoningReducing the time required to find the Kemeny ranking by exploiting a necessary condition for being a winner
2023, European Journal of Operational ResearchEfficient ensembles of distance-based label ranking trees
2024, Expert Systems