Utopia in the solution of the Bucket Order Problem

doi:10.1016/j.dss.2017.03.006

Decision Support Systems

Volume 97, May 2017, Pages 69-80

https://doi.org/10.1016/j.dss.2017.03.006 Get rights and content

Highlights

•
The standard greedy algorithm for OBOP, Bucket Pivot Algorithm (BPA), is significantly improved.
•
The concept of utopian matrix is introduced and used to carry out an informed selection of the pivot.
•
Several items are allowed to “give their opinion” during the ordering process, which yields a multi-pivot instead of single-pivot approach.
•
Decision rules are provided to select the right BPA-based algorithm according to the decision maker preferences.

Abstract

This paper deals with group decision making and, in particular, with rank aggregation, which is the problem of aggregating individual preferences (rankings) in order to obtain a consensus ranking. Although this consensus ranking is usually a permutation of all the ranked items, in this paper we tackle the situation in which some items can be tied, that is, the consensus shows that there is no preference among them. This problem has arisen recently and is known as the Optimal Bucket Order Problem (OBOP).

In this paper we propose two improvements to the standard greedy algorithm usually considered to approach the bucket order problem: the Bucket Pivot Algorithm (BPA). The first improvement is based on the introduction of the Utopian Matrix, a matrix associated to a pair order matrix that represents the precedences in a collection of rankings. This idealization constitutes a superoptimal solution to the OBOP, which can be used as an extreme (sometimes feasible) best value. The second improvement is based on the use of several items as pivots to generate the bucket order, in contrast to BPA that only uses a single pivot. The set of items playing the role of decision-maker is dynamically created. We analyze separately the contribution of each improvement and also their joint effect. The statistical analysis of the experiments carried out shows that the combined use of both techniques is the best choice, showing a significant improvement in accuracy (17%) with respect to the original BPA and providing an important reduction in the variance of the output. Moreover, we provide decision rules to help the decision maker to select the right algorithm according to the problem instance.

Introduction

This paper falls in the field of group decision making (GDM), a problem in which several agents (individuals, experts, software agents, organizations, etc.) expose their opinions regarding a decision making problem, and it is necessary to reach a consensus among them [1]. Although a GDM problem may be solved by selecting one of the proposed alternatives, in this way not all the agent's particular preferences would be considered properly. Because of this, in many methodologies for GDM the operation of reaching a consensus is even considered as an additional phase of the GDM process.

Different approaches can be followed to deal with the GDM problem [2] and, in particular, with consensus reaching [1], many of them based on the use of fuzzy sets theory [3]. A simple taxonomy of consensus reaching approaches is provided in [1] based on two dimensions: allowing or not a feedback mechanism [4], [5] and evaluating alternatives by the distance between experts or the distance to the collective preference [6], [7].

In this paper we approach the GDM from the perspective of social choice and voting theories. In particular, our contribution is located in rank aggregation, a typical preference learning [8], [9] problem with many applications to decision making. The goal of rank aggregation is to combine a set of individual preferences or precedences, expressed by different agents in the form of rankings over (some of) the provided items or alternatives, into a consensus ranking which represents the collective opinion of the agents involved. Regarding the taxonomy in [1], this problem mainly falls in the category of consensus models without a feedback mechanism and with a consensus measure based on computing pairwise similarities.

Rank aggregation methods have traditionally been applied in marketing, advertisement research and applied psychology, and, as pointed out in [10], “more recently they have emerged as an important tool to combine information coming from different internet search engines or from different omics-scale biological studies”[11], [12], [13], [14], [15]. In the field of information and decision support systems, rank aggregation has also a broad applicability, which ranges from: selecting the right information system in the context of a business application [16]; assisting in the process of discovering the cloud service candidates that have the highest customer satisfaction [17]; estimating the effort and cost for developing an information system [18]; automating the process of data integration by matching concepts which describe the meaning of data in various data sources (database schemata, XML, DTDs, etc.) [19]; etc. Apart from its application as an end, solving the rank aggregation problem is also used as a building-block in dealing with problems that involve estimating the consensus permutation many times, e.g. optimization [20] and machine learning [21].

However, not all the previous applications solve the same rank aggregation problem, as this is a general term which embraces several problems. Thus, when all the agents give a complete and strict precedence ranking of the items, that is, a permutation, then the problem is known as the Kemeny ranking problem (KRP)[22], [23]. The term rank aggregation problem (RAP) is usually considered as a generalization of the KRP, allowing to the agents to produce (in)complete rankings with or without ties [24]. Both problems, KRP and RAP, have in common that the solution is a permutation (i.e. a complete ranking without ties) defined over all the items. KRP and RAP are NP-complete [24], [25], so heuristic greedy algorithms are usually employed to tackle them [23], [26], [27], [28], [29], [30], [31].

In this paper we focus on a more general, or flexible, problem, which allows us to obtain a ranking with ties as consensus. The use of ties in the solution arises as a more natural option when no strict preferences are individually or collectively given by the agents. For example, let us consider a set of rankings in which none of the agents individually expresses any preference between items 1 and 2, that is, they are tied in all the rankings given by the agents. So, why must this tie be broken in the consensus ranking? In other cases, the ties may arise from the collective opinion. For example, if we have the rankings¹{1|2|3|4,2|1|3|4,1|2|4|3,2|1|4|3}, then, it is obvious that the four agents agree that i is better than j for i ∈{1,2} and j ∈{3,4}, but there is no consensus with respect to the preference between 1 and 2, and between 3 and 4. Hence, the most reasonable solution in this case would be 1,2|3,4.

Dealing with rank aggregation while allowing ties in the solution or consensus ranking is known as the Optimal Bucket Order Problem (OBOP)[32], [33]. In addition to the real-world applications inherited from the rank aggregation problem, as reported in [34], the OBOP has also been applied “in the context of seriation problems in scientific disciplines, such as Paleontology, Archaeology and Ecology”. In this paper we propose several substantial improvements to the greedy algorithm which currently constitutes the standard approach to solve the OBOP. Since these improvements lead to different BPA-based algorithms, we obtain decision rules to support the decision maker in the process of selecting the best method according to their preferences and/or the problem instance features.

The rest of the paper is organized as follows. In Section 2 we motivate our work by highlighting the weaknesses of the standard algorithm used to solve the OBOP, and state our research goal. Next, in Section 3 we describe the OBOP and the BPA algorithm, introducing the notation to be used throughout this work. Section 4 presents the concept of the Utopian Matrix and some other derived notions. In Section 5 we introduce the modifications proposed for the BPA in the case where only one item is used as pivot, which involves changing the way of selecting it. Section 6 is devoted to presenting an experimental study that confirms that the proposed modifications outperform the original BPA. In Section 7 we extend the previous results to the multi-pivot case. Then, in Section 8 we perform an experimental study for all the proposed algorithms. Finally, in Sections 9 we discuss our results.

Section snippets

Motivation and research goal

As might be expected, the OBOP is NP-Complete [32] and so several heuristic greedy approaches have been contemplated to tackle it. In [35] a heuristic algorithm is designed to obtain the consensus bucket order from a set of full rankings (permutations). A more general/flexible approach, which does not limit the kind of input rankings is the Bucket Pivot Algorithm (BPA)[32], [33]. This algorithm has a clear resemblance to quicksort. It starts with the random selection of an item as pivot and

The Optimal Bucket Order Problem (OBOP)

In this section we introduce the notation and formalize the OBOP. Given a set of items [[n]] = {1,...,n}, a bucket orderℬ is an ordered partition of [[n]] [32], [33], [37]. More precisely, it is a linear ordering of disjoint subsets (buckets) $B_{1}, B_{2}, \dots, B_{k}$ of [[n]], 1 ≤ k ≤ n, with $\cup_{i = 1}^{k} B_{i} = [[n]]$ . Thus, given two buckets B_i,B_j in ℬ, we will write $B_{i} ≺_{B} B_{j}$ to indicate that B_i precedes B_j according to the bucket order ℬ. Analogously, given two items u ∈ B_i,v ∈ B_j, we will write $u ≺_{B} v$ if $B_{i} ≺_{B} B_{j}$ . All the

Utopian matrix and its implications for pivot selection

In this section we introduce the utopian matrix and related concepts.

Definition 1

Given a pair order matrix C, the utopian matrixU_C associated with C is the n × n matrix defined as $U_{C} (u, v) = ϒ (C (v, u))$ where $ϒ (x) = \{\begin{array}{l} 1 & i f x > 0.75 \\ 0.5 & i f 0.25 \leq x \leq 0.75 \\ 0 & i f x < 0.25 \end{array}$

Then, the utopia valueu_C associated with C is u_C = D(U_C,C).

Note that for any pair order matrix C, the maximum distance between a particular entry and the corresponding one in the associated utopian matrix U_C is 0.25, and this happens only when the value of the entry is

BPA with least indecision assumption

Now, we show how the information provided by the utopian matrix can be used to select the pivot in an informed way. First we define an index to measure the goodness of selecting an item as pivot, and then we propose two different schemes to integrate its use in BPA.

Experimental study of BPA^LIA algorithm(s)

In this section we carry out an experimental comparison between the original BPA and the proposed BPA^LIA algorithms, namely LIA_G and LIA_L. All the experiments have been run in a personal computer with a processor Intel i7-6700, 3.40 GHz, 8 cores and 16 Gb of RAM. All the algorithms have been coded in Prolog.

As a benchmark we use 50 real-world datasets of rankings available at PrefLib[39]. In particular, we downloaded the pwg files³

Using multiple pivots

The BPA and BPA^LIA algorithms use a single item as pivot to decide in which list (L,S or R) the remaining items are placed (see Fig. 1). However, it seems plausible to progressively use the information provided by the items placed in the list containing the pivot (S), since this list will remain as a bucket itself in the resulting bucket order. From now on, we call this approach multi-pivot (MP).

In order to let all the items included in (S) intervene in the process of placing each new item, we

Experimental analysis

In order to explore the advantages of the multi-pivot approach we carry out a new set of experiments using the same benchmark as in Section 6. Regarding the algorithms, we consider the combination of the three BPA approaches discussed in the previous sections with the two multi-pivot strategies (MP and MP2). Consequently, we introduce six new algorithms called: BPA^MP, BPA^MP2, LIA $_{G}^{M P}$ , LIA $_{G}^{M P 2}$ , LIA $_{L}^{M P}$ and LIA $_{L}^{M P 2}$ . Furthermore, in our experimental study we also include the three single-pivot

Improving BPA

In Section 2 we identified the main weaknesses of BPA algorithm and outline our ideas to overcome them. Next, we summarize how our proposals have actually had success.

First, we pointed out the use of a random pivot as the most critical decision in BPA. To overcome this drawback, we proposed to select the pivot in an informed way. To do this, we introduced the theoretical concept of Utopian Matrix and showed how it may be used to evaluate the precedences matrices that are the input for the OBOP.

Acknowledgements

This work was partially financed by the Junta de Comunidades de Castilla-La Mancha, Universidad de Castilla-La Mancha and FEDER funds by means of the projects PEII-2014-049 and CCI-2014ES16RFOP010.

Juan A. Aledo received the M.S. degree in Mathematics in 1997 and the Ph.D. degree in Mathematics in 2000, both from the University of Murcia, Spain. He joined the Department of Mathematics at the University of Castilla-La Mancha (UCLM) in 1997, where he is currently a Full Professor. His main research interests include differential geometry, discrete mathematics, decision making and machine learning. In these topics Dr. Aledo has (co)authored more than sixty papers in journals, books and

References (42)

I. Palomares et al.
Consensus under a fuzzy context: taxonomy, analysis framework afryca and experimental case of study
Inf. Fusion
(2014)
Z. Wu et al.
A consistency and consensus based decision support model for group decision making with multiplicative preference relations
Decis. Support. Syst.
(2012)
P. Eklund et al.
Consensus reaching in committees
Eur. J. Oper. Res.
(2007)
Z. Xu et al.
Group consensus algorithms based on preference relations
Inform. Sci.
(2011)
L. Akritidis et al.
Effective rank aggregation for metasearching
J. Syst. Softw.
(2011)
S. Ding et al.
Utilizing customer satisfaction in ranking prediction for personalized cloud service selection
Decis. Support. Syst.
(2017)
A. Ali et al.
Experiments with Kemeny ranking: what works when?
Math. Soc. Sci.
(2012)
I. Contreras
Emphasizing the rank positions in a distance-based aggregation procedure
Decis. Support. Syst.
(2011)
J.A. Aledo et al.
Using extension sets to aggregate partial rankings in a flexible setting
Appl. Math. Comput.
(2016)
S. Amodio et al.
Accurate algorithms for identifying the median ranking when dealing with weak and partial rankings under the Kemeny axiomatic approach
Eur. J. Oper. Res.
(2016)

G. Napoles et al.

Prototypes construction from partial rankings to characterize the attractiveness of companies in Belgium

Appl. Soft Comput.

(2016)

Y.L. Chen et al.

An approach to group ranking decisions in a dynamic environment

Decis. Support. Syst.

(2010)

A. Ukkonen et al.

A randomized approximation algorithm for computing bucket orders

Inf. Process. Lett.

(2009)

B. Ervural et al.

A Taxonomy for Multiple Attribute Group Decision Making Literature

J. Kacprzyk et al.

On Group Decision Making, Consensus Reaching, Voting, and Voting Paradoxes under Fuzzy Preferences and a Fuzzy Majority: A Survey and a Granulation Perspective

E. Herrera-Viedma et al.

A consensus model for multiperson decision making with different preference structures

Trans. Syst. Man Cybern. Part A

(2002)

J. Fürnkranz et al.

Y. Lu

Implementing an Empirical Study of Rank Aggregation Approaches Based on Real World Instances, CoRR Abs/1402.5259

(2014)

S. Lin

Rank Aggregation Methods

Wiley Interdiscip. Rev. Computat. Stat.

(2010)

M.E. Renda et al.

Web Metasearch: Rank Vs. Score Based Rank Aggregation Methods

R. Kolde et al.

Robust rank aggregation for gene list integration and meta-analysis

Bioinformatics

(2012)

Cited by (24)

Multi-dimensional Bayesian network classifiers for partial label ranking
2023, International Journal of Approximate Reasoning
The label ranking problem consists in learning preference models from training datasets labeled with (possibly incomplete) rankings of the class labels. The goal is then to predict a ranking for a given unlabeled instance. This work focuses on a more general interpretation where both the training dataset and the prediction given as output allow tied class labels, i.e., there is no particular preference between them. This problem is known as the partial label ranking problem. This paper tackles the partial label ranking problem by transforming the ranking with ties into a set of discrete variables representing the preference relations (ranked ahead, tied with, and ranked behind) between each pair of class labels. The posterior probabilities for each pair are then used to fill the values of a preference matrix. This preference matrix is the basis for solving the rank aggregation problem required to obtain the output ranking with ties. This paper aims to exploit the resemblance of this problem with multi-label and multi-dimensional classification by studying the use of Bayesian network classifiers to compute the posterior probabilities for the new class structure, i.e., pairs of class labels. In particular, binary relevance with naive Bayes and averaged one-dependence estimators between the new class structure are used to solve the partial label ranking problem. Furthermore, bivariate relationships between all the pairs of class labels are considered. However, the complexity of the model grows significantly, which makes it necessary to reduce the number of allowed bivariate relationships between pairs. Thus, a feature selection method is included to select the more relevant subset of bivariate relationships. The experimental evaluation shows that our proposals are competitive in accuracy with the current instance-based and decision tree induction algorithms. Moreover, they outperform the existing mixture-based probabilistic graphical models, while the algorithms proposed are much faster.
Pairwise learning for the partial label ranking problem
2023, Pattern Recognition
The partial label ranking problem is a particular preference learning scenario that focuses on learning preference models from data, such that they predict a complete ranking with ties defined over the values of the class variable for a given input instance. This work proposes to transform the rankings into preference relations among pairs of class labels and to learn a standard classifier for each of them. This classifier is then used to estimate the probability of each event from the preference relation between the two compared class labels. Finally, the probabilities obtained for each preference comparison are used to compute a preference matrix utilized to solve the corresponding rank aggregation problem and so obtain the ranking among all the class labels. The experimental evaluation shows that the proposed method is ranked ahead of competing algorithms in accuracy while obtaining similar CPU time results.
Approximate Condorcet Partitioning: Solving large-scale rank aggregation problems
2023, Computers and Operations Research
Rank aggregation has ubiquitous applications in computer science, operations research, and various other fields. Most attention on this problem has focused on an NP-hard variant known as Kemeny aggregation, for which solution approaches with provable guarantees that can handle difficult high-dimensional instances remain elusive. This work introduces exact and approximate methodologies inspired by the social choice foundations of the problem, namely the Condorcet Criterion. We formalize the concept of the finest-Condorcet partition for rankings that may contain ties and specify its required conditions. We prove that this partition is unique and devise an efficient algorithm to obtain it. To deal with instances where it does not yield many subsets, we propose Approximate Condorcet Partitioning (ACP), with which larger subsets can be further broken down and more easily solved. ACP is a scalable solution technique capable of handling large instances while still providing provable guarantees. Although ACP approximation factors are instance-specific, their values were lower than those offered by all known constant-factor approximation schemes — inexact algorithms whose resulting objective values are guaranteed to be within a specified fixed percent of the optimal objective value — for all 113 instances tested herein (containing up to 2,820 items). What is more, ACP obtained solutions that deviated by at most two percent from the optimal objective function values for a large majority of these instances.
Complexity reduction and approximation of multidomain systems of partially ordered data
2022, Computational Statistics and Data Analysis
Citation Excerpt :
Often, the target unknown poset is supposed to have a simple shape, as in the case of so-called bucket orders (i.e., informally speaking, of rankings with ties; see Section 8), which are relevant in many fields, for example in connection with the seriation problem in paleontology (Puolamäki et al., 2006). Algorithms for the reconstruction of bucket orders (or their subclasses) are available in Fernandez et al. (2013) Feng et al. (2008), Ukkonen et al. (2009), Aledo et al. (2017), and D'Ambrosio et al. (2019). Somewhat related to this research is the problem of reconstructing preferences, from partial information, usually in the context of the Mallows models (Lu and Boutilier, 2014) and the Plackett–Luce models (Liu et al., 2019; Zhao and Xia, 2019, 2020).
Two greedy algorithms for the synthesis and approximation of multidomain systems of partially ordered data are proposed. Given k input partially ordered sets (posets) on the same elements, the algorithms search for the optimally approximating partial orders, minimizing the dissimilarity between the generated and input posets, based on their matrices of mutual ranking probabilities. A general approximation algorithm is developed, together with a specific procedure for approximation over bucket orders, which are the natural choice when the goal is to “condense” the inputs into rankings, possibly with ties. Different loss functions are also employed, and their outputs are compared. A real example pertaining to regional well-being in Italy motivates the algorithms and shows them in action.
A highly scalable algorithm for weak rankings aggregation
2021, Information Sciences
Citation Excerpt :
Finally, in Section 7 we present our concluding remarks. Throughout this paper we will use the notions of utopian matrix and utopia value introduced in [5], which we briefly review below. Finally, as pointed out in Section 1, in the recent work [6] several evolution strategies were designed to tackle the OBOP.
The Optimal Bucket Order Problem (OBOP) is a rank aggregation problem which consists in finding a consensus ranking (with ties) that generalizes a set of input rankings. In this paper, with the aim of solving the OBOP in an efficient and scalable way, we propose several greedy algorithms based on different sort-first and cluster-second strategies. More specifically, the sorting step is based on the Borda method, whereas in the cluster step, pairs of adjacent buckets are suitably joined.
The proposed methods are experimentally compared with the state-of-the-art greedy algorithms for solving the OBOP by using a large benchmark of real-world databases. Furthermore, we provide a complete statistical analysis of the experimental study, which shows that several of the proposed algorithms outperform the current state-of-the-art greedy algorithms. We also analyze the trade-off between accuracy and execution time of the algorithms to guide the users regarding the selection of the best option for each particular case. The study carried out shows that our proposal is not only competitive in terms of accuracy with the state-of-the-art evolutionary strategy for dealing with the OBOP, but is also fast and scalable.
Multi-criteria node criticality assessment framework for critical infrastructure networks
2020, International Journal of Critical Infrastructure Protection
Citation Excerpt :
Note that this paper extends our early work in [27] to the case of different weights for the different metrics, possibly defined over different graphs having the same set of nodes (e.g., we consider different sets of edges, each conveying specific information such as structural interconnection, flow or other dependencies) and possibly defined over subsets of the nodes. It should be noted that the problem of aggregating rankings has raised some interest in previous researches: in [28] Kendall and Hausdorff distances are used to compare rankings and a median-based approach is used to identify an overall ranking; in [29] interval ordinal rankings are considered; in [30] (and references therein) the bucket order problem is considered, i.e., finding an agreement based on several ranking matrices with ordinal information; in [31] centrality measures are combined to devise a control strategy that minimizes control energy in networked dynamical systems. Notice that, in [4], the authors quantify the correlation of centrality measures with risk levels in Dependency Risk Graphs and provide an heuristic algorithm to recursively select a subset of nodes based on the centrality measure with the highest correlation.
Spotting criticalities in Critical Infrastructure networks is a crucial task in order to implement effective protection strategies against exogenous or malicious events. Yet, most of the approaches in the literature focus on specific aspects (e.g., presence of hubs, minimum paths) and there is a need to identify tradeoffs among importance metrics that are typically clashing with each other. In this paper we propose an approach for the assessment of criticalities which combines multi-criteria decision making techniques and topological/dynamical centrality measures. In particular, we resort to the Sparse Analytic Hierarchy Process (SAHP) technique to calculate the relevance of the different metrics based on pairwise comparisons of the metrics by Subject Matter Experts (SMEs) and to merge the different metrics into a holistic indicator of node criticality/importance that takes into account all the metrics. With the aim to experimentally demonstrate the potential of the proposed approach, we consider a case study related to the Central London Tube Network. According to the experimental results, the proposed aggregated ranking exhibits negligible correlation with the single metrics being aggregated, thus suggesting that the proposed approach effectively combines the different metrics into a new perspective.

View all citing articles on Scopus

Jose A. Gámez received the M.S. degree in Computer Science in 1991, and the Ph.D. degree in Computer Science in 1998, both from the University of Granada, Spain. He joined the Department of Computer Science at the University of Castilla-La Mancha (UCLM) in 1991, where he is currently a Full Professor. His main research interests include probabilistic reasoning, Bayesian networks, metaheuristic algorithms, decision making, machine learning and data mining. In these topics Dr. Gamez has edited six books and six special issues of international journals. He is the (co)author of more than one hundred papers in journals, books and refereed international conferences.

Alejandro Rosete received the M.Sc. degree in applied informatics and the Ph.D. degree in Informatics from Higher Polytechnic Institute Jose Antonio Echeverría (CUJAE), La Habana, Cuba, in 1995 and 2000, respectively. He has been the Head of the Department of Artificial Intelligence and Infrastructure of Informatics Systems, CUJAE. He has published over 40 papers. He is a co-author of the book Lógica y Algoritmos (Editorial Felix Varela, Habana, 2004). His research interests include metaheuristics, agent-oriented software engineering, decision making, data mining, fuzzy systems, and knowledge extraction based on metaheuristics.

View full text

Utopia in the solution of the Bucket Order Problem

Highlights

Abstract

Introduction

Section snippets

Motivation and research goal

The Optimal Bucket Order Problem (OBOP)

Utopian matrix and its implications for pivot selection

BPA with least indecision assumption

Experimental study of BPALIA algorithm(s)

Using multiple pivots

Experimental analysis

Improving BPA

Acknowledgements

Inf. Fusion

Decis. Support. Syst.

Eur. J. Oper. Res.

Inform. Sci.

J. Syst. Softw.

Decis. Support. Syst.

Math. Soc. Sci.

Decis. Support. Syst.

Appl. Math. Comput.

Eur. J. Oper. Res.

Appl. Soft Comput.

Decis. Support. Syst.

Inf. Process. Lett.

A Taxonomy for Multiple Attribute Group Decision Making Literature

On Group Decision Making, Consensus Reaching, Voting, and Voting Paradoxes under Fuzzy Preferences and a Fuzzy Majority: A Survey and a Granulation Perspective

A consensus model for multiperson decision making with different preference structures

Trans. Syst. Man Cybern. Part A

Implementing an Empirical Study of Rank Aggregation Approaches Based on Real World Instances, CoRR Abs/1402.5259

Rank Aggregation Methods

Wiley Interdiscip. Rev. Computat. Stat.

Web Metasearch: Rank Vs. Score Based Rank Aggregation Methods

Robust rank aggregation for gene list integration and meta-analysis

Bioinformatics

Experimental study of BPA^LIA algorithm(s)