Experiments with Kemeny ranking: What works when?

doi:10.1016/j.mathsocsci.2011.08.008

Mathematical Social Sciences

Volume 64, Issue 1, July 2012, Pages 28-40

https://doi.org/10.1016/j.mathsocsci.2011.08.008 Get rights and content

Abstract

This paper performs a comparison of several methods for Kemeny rank aggregation (104 algorithms and combinations thereof in total) originating in social choice theory, machine learning, and theoretical computer science, with the goal of establishing the best trade-offs between search time and performance. We find that, for this theoretically NP-hard task, in practice the problems span three regimes: strong consensus, weak consensus, and no consensus. We make specific recommendations for each, and propose a computationally fast test to distinguish between the regimes.

In spite of the great variety of algorithms, there are few classes that are consistently Pareto optimal. In the most interesting regime, the integer program exact formulation, local search algorithms and the approximate version of a theoretically exact branch and bound algorithm arise as strong contenders.

Highlights

► We compare 104 algorithms for Kemeny rank aggregation across several datasets. ► We find that the problem instances span 3 regimes: strong, weak, and no consensus. ► We make specific algorithmic recommendations for each regime. ► We also propose a computationally fast test to distinguish between the 3 regimes. ► The integer program, local search, and branch and bound algorithms perform well.

Introduction

Preference aggregation has been extensively studied by economists under social choice theory. Arrow discussed certain desirable properties that a ranking rule must have and showed that no rule can simultaneously satisfy them all (Arrow, 1963). Thus, a variety of models of preference aggregation have been proposed, each of which satisfy subsets of properties deemed desirable. In theoretical computer science, too, many applications of preference aggregation exist, including merging the results of various search engines (Cohen et al., 1998, Dwork et al., 2001), collaborative filtering (Pennock et al., 2000), and multiagent planning (Ephrati and Rosenschein, 1993).

The Kemeny ranking rule (Kemeny and Snell, 1962) is widely used in both of these areas. From Arrow’s Axioms’ point of view, the Kemeny ranking is the unique rule satisfying the independence of irrelevant alternatives and the reinforcement axiom (Young and Levenglick, 1978). More recently, it has been found to have a natural interpretation as the maximum likelihood ranking under a very simple noise model proposed by Condorcet (Young, 1995). The same noise model was proposed independently in statistics by Mallows (1957) and refined by Fligner and Verducci (1986), under the name Mallows’ model. Further extensions that go beyond the scope of this paper but that are relevant to modeling preferences have been proposed by Mao and Lebanon (2008) and Meilă and Bao (2010).

Finding the Kemeny ranking is unfortunately a computational challenge, since the problem is NP-hard even for only four votes (Bartholdi et al., 1989, Cohen et al., 1998). Since the problem is important across a variety of fields, many researchers across these fields have converged on finding good, practical algorithms for its solution. There are formulations of the problem that lead to exact algorithms, of course without polynomial running time guarantees, and we present two of these in Section 3.1. There are also a large number of heuristic and approximate algorithms, and we enumerate several classes of these algorithms in Section 3.2.

Surveying the various approaches is only the first step, since we are interested in finding which algorithms (or combinations of algorithms) are likely to perform well in practice. For this, we perform a systematic comparison, running all algorithms on a batch of real-world and synthetic Kemeny ranking tasks.

Since we are attacking an intractable task, what we can examine is the trade-off between computational effort and quality of the obtained solution. We will examine which algorithms, if any, are systematically achieving optimal trade-offs. We hope thereby to be able to recommend reliable methods to practitioners.

Such an analysis was recently undertaken by Schalekampf and van Zuylen (2009), forthwith abbreviated as SvZ, with whose work several parallels exist. While from the point of view of the experimental setup our work has little to add, we differ fundamentally in the conclusions we draw. That is because we factor in the difficulty of the actual problem, something that SvZ do not consider. Since both the performance and running time of an algorithm can change with the difficulty of the problem, we expect that the “best” algorithm to use will differ depending on the operating regime.

Hence, we propose several natural measures of difficulty for the Kemeny ranking problem, and re-examine the algorithms’ performance results from this perspective.

The rest of the paper is structured as follows. The next section formally defines the Kemeny ranking problem and introduces the relevant notation. Sections 3.1 Exact algorithms, 3.2 Approximate algorithms review the algorithms, while Section 4 presents the datasets we used for experimental evaluation and how they were obtained. The experimental results follow in Section 5; this section already gives some insights into what algorithms are promising, and into the role of the problem difficulty in shaping the performance landscape. We further examine these findings in Section 6. Section 7 discusses the different regimes of difficulty of the Kemeny ranking problem and proposes a simple heuristic to distinguish between them in practice. The paper concludes with Section 8.

Section snippets

The Kemeny ranking problem

Given a profile of $N$ rankings¹ $π_{1}, π_{2}, \dots, π_{N}$ over $n$ alternatives, the Kemeny ranking problem is the problem of finding the ranking $π_{0}^{*} = {argmin}_{π_{0}} \frac{1}{N} \sum_{i = 1}^{N} d (π_{i}, π_{0}) .$ In the above, $d (π, π^{'})$ represents the Kendall distance between two permutations $π, π^{'}$ defined as the minimum number of adjacent transpositions needed to turn $π$ into $π^{'}$ . The ranking $π_{0}^{*}$ is the ranking that minimizes the total number of disagreements with the

Algorithms

We aimed for an ensemble of algorithms that would allow for various trade-offs between performance and accuracy. We first discuss the algorithms we experimented with that return an exact solution to the Kemeny ranking problem. Then, we discuss algorithms that return an approximate solution.

Datasets

In this section, we discuss the datasets that we used for our experiments. In collecting these datasets, we aimed for a variety of lengths ( $n$ ), number of rankings ( $N$ ), and degrees of consensus. To this end, we used real-world and synthetic datasets. We first describe the real-world and then move on to the synthetic datasets.

Methodology

We ran each algorithm described above on each data set (i.e. 104 algorithms $\times$ 49 data sets) and recorded the running time, resulting ranking, and other variables.

As a measure of algorithm accuracy we use the number of pairwise disagreements between the algorithm’s output ranking and all the data, i.e. $cost (π_{0}) = \sum_{i = 1}^{N} d (π_{i}, π_{0}) .$ The above is nothing else than the rescaled r.h.s. of Eq. (1), so it reflects how well the algorithm has optimized the desired objective. We call this objective the cost of

The Pareto curves and general observations

Across all experiments described here we observe the following. First, no algorithm reaches both the lowest cost and lowest runtime simultaneously over all experiments. This is why in our experimental results, we traced the Pareto boundary, i.e. the set of algorithms that cannot be bettered in both cost and running time by another algorithm.

The shape of the Pareto boundary tell us about the possible trade-offs between run time and accuracy. It is easy to observe that in practically all

Regimes of difficulty: strong, weak, and no consensus

In the tasks above, we presented the algorithms with difficulties ranging from hard to easy. The reader has probably noted that the success of the algorithms (and implicitly the difficulty of the problem) did not depend on $n$ alone, but also on the data distribution. In particular, the concentration of the data around the Kemeny ranking had a strong influence on the algorithms’ ability to optimize the cost, and for some algorithms (like B&B) it also influenced the running time. We now discuss

Conclusions

We have performed an extensive comparison of algorithms for the Kemeny rank aggregation problem, originating in social choice theory, machine learning, and theoretical computer science. The problem being NP-hard, the focus has been on establishing the best trade-offs between search time and performance.

Acknowledgments

We thank the anonymous reviewers for the constructive feedback that significantly improved this paper. MM acknowledges partial support from NSF award IIS-0535100.

References (31)

N. Betzler et al.
Fixed-parameter algorithms for Kemeny rankings
Theoretical Computer Science
(2009)
N. Ailon et al.
Aggregating inconsistent information: ranking and clustering
K.J. Arrow
Social Choice and Individual Values
(1963)
J. Bartholdi et al.
Voting schemes for which it can be difficult to tell who won
Social Choice and Welfare
(1989)
N. Betzler et al.
Partial kernelization for rank aggregation: theory and experiments
J. Borda
Memoire sur les Elections au Scrutin
(1781)
S. Chanas et al.
A new heuristic algorithm solving the linear ordering problem
Computational Optimization and Applications
(1996)
W.W. Cohen et al.
Learning to order things
Journal of Artificial Intelligence Research
(1998)
Coleman, T., Wirth, A., 2008. Ranking tournaments: local search and a new algorithm. In: Proceedings of the Workshop on...
Conitzer, V., Davenport, A., Kalagnanam, J., 2006. Improved bounds for computing Kemeny rankings. In: Proceedings of...

Copeland, A., 1951. A ‘reasonable’ social welfare function. Technical Report. Seminar on Applications of Mathematics to...

Davenport, A., Kalagnanam, J., 2004. A computational study of the Kemeny rule for preference aggregation. In:...

P. Diaconis et al.

Spearman’s footrule as a measure of disarray

Journal of the Royal Statistical Society. Series B

(1977)

Dwork, C., Kumar, R., Naor, M., Sivakumar, D., 2001. Rank aggregation methods for the web. In: Proceedings of the 10th...

B. Efron et al.

An Introduction to the Bootstrap

(1993)

Cited by (143)

A unifying rank aggregation framework to suitably and efficiently aggregate any kind of rankings
2023, International Journal of Approximate Reasoning
The aggregation of multiple rankings into a consensus ranking is a crucial task in various domains such as search engine results or user-based ratings. This task poses significant challenges due to its inherent complexity. The complexity of the problem stems not only from the need for exactness and efficiency, but also from the diversity of real-world scenarios, which often involve incomplete rankings and ties.
Most existing methods propose a specific way to aggregate rankings. However, these methods often do not take into account different real use-case scenarios which can impact the relevance of their final result, as the congruence between the aggregated output and the expected outcome inherently depends on the context. To address the issue of context-dependency in ranking aggregation, we introduce a unifying framework that subsumes a variety of generalizations of the Kemeny score for incomplete rankings with ties and enables the design of new ones if a specific context requires it. Our framework is parameterized, allowing for different behaviors depending on the specific use case.
We provide a broader scope of application to the methods encompassed by our approach, augmenting them with a larger theoretical and algorithmic structure. We establish an axiomatic study to better understand each method within our framework and present an algorithmic approach that includes exact methods, partitioning algorithms, and heuristics.
Finally, we demonstrate the practical relevance of our approach through an empirical study on both real and synthetic datasets. Notably, the synthetic datasets are generated based on devised real-world scenarios, highlighting the context-dependent applicability of different Kemeny-based rank aggregation methods within our framework.
Reducing the time required to find the Kemeny ranking by exploiting a necessary condition for being a winner
2023, European Journal of Operational Research
The ranking aggregation problem is gaining attention in different application fields due to its connection with decision making. One of the most famous ranking aggregation methods can be traced back to Kemeny in 1959. Unfortunately, the problem of determining the result of the aggregation proposed by Kemeny, known as Kemeny ranking as it minimizes the number of pairwise discrepancies from a set of rankings given by voters, has been proved to be NP-hard, which unfortunately prevents practitioners from using this method in most real-life problems. In this work, we introduce two exact algorithms for determining the Kemeny ranking. The best of these algorithms guarantees a reasonable search time up to 14 alternatives, showing an important reduction of the execution time in comparison to other algorithms found in the literature. Moreover, a dataset of profiles of rankings is provided and a study of additional aspects of the votes that may have impact on the execution time required to determine the winning ranking is also detailed.
Efficient ensembles of distance-based label ranking trees
2024, Expert Systems
Maximal Covering Problem with Location Preferences
2024, SSRN
Space Reduction Techniques for the 3-Wise Kemeny Problem and Applications
2024, SSRN
Properties of the Mallows Model Depending on the Number of Alternatives: A Warning for an Experimentalist
2024, arXiv

View all citing articles on Scopus

View full text

Experiments with Kemeny ranking: What works when?

Abstract

Highlights

Introduction

Section snippets

The Kemeny ranking problem

Algorithms

Datasets

Methodology

The Pareto curves and general observations

Regimes of difficulty: strong, weak, and no consensus

Conclusions

Acknowledgments

Theoretical Computer Science

Aggregating inconsistent information: ranking and clustering

Social Choice and Individual Values

Voting schemes for which it can be difficult to tell who won

Social Choice and Welfare

Partial kernelization for rank aggregation: theory and experiments

Memoire sur les Elections au Scrutin

A new heuristic algorithm solving the linear ordering problem

Computational Optimization and Applications

Learning to order things

Journal of Artificial Intelligence Research

Spearman’s footrule as a measure of disarray

Journal of the Royal Statistical Society. Series B

An Introduction to the Bootstrap