Elsevier

Information Sciences

Volume 585, March 2022, Pages 113-126
Information Sciences

Algorithm for computing all the shortest reducts based on a new pruning strategy

https://doi.org/10.1016/j.ins.2021.11.037Get rights and content

Abstract

In this paper, we introduce an algorithm for computing all the shortest reducts in a decision system. The proposed algorithm is based on determining the size of the shortest reducts using a small super-reduct and some new pruning methods. Once the size of the shortest reduct is determined, all other reducts of the same size are found applying the new pruning methods. The results of our experiments using several synthetic and real-world decision systems show that the proposed algorithm is, in most cases, faster than the state of the art algorithms for computing all the shortest reducts reported in the literature.

Introduction

In Rough Set Theory (RST) [1], supervised classification problems are represented through Decision Systems. A Decision System is a table in which the rows and columns represent objects and attributes, respectively, of a data set. On the other hand, the minimal subsets of attributes that preserve the capability of the whole set of features to discern objects from different classes in a decision system are called reducts.

The problem of computing all the reducts of a decision system has exponential complexity regarding the number of attributes in the decision system [2]. Thus, several approximate algorithms for reduct computation have been developed using heuristics based on: the importance of attributes [3], mutual information [4], and genetic algorithms [5], [6], among others.

The main drawback of these algorithms is that they do not necessarily return the whole set of reducts of a decision system. Even they sometimes obtain super-reducts, i.e. subsets not necessarily irreducible. Additionally, for some applications, not all the reducts are needed [7], [8]; and some studies have evaluated the feasibility of computing only the shortest reducts, instead of computing all of them [9].

Among other areas, the shortest reducts can be used in dimensionality reduction [10], [11], [12], feature selection [13], [14], and building classifiers based on rules [15], [16]. In the worst case, the computation of all the shortest reducts has the same exponential complexity as computing all the reducts. However, in practice, computing only the shortest reducts is generally much faster than computing the whole set of reducts. Thus, in practical applications such as those mentioned above, computing the shortest reducts may be preferred over computing all the reducts.

In this paper, we propose an algorithm for computing all the shortest reducts in a decision system. In the development of this new algorithm, some of the most effective search space pruning methods used in state-of-the-art algorithms to compute all reducts are used, and some new pruning methods are introduced. Our new algorithm is compared against state-of-the-art algorithms, using synthetic and real-world datasets.

The rest of this paper is structured as follows: in Section 2, we present the related work; Section 3 contains the theoretic background; Section 4 details our proposal; in Section 5, we show and discuss the experimental results; and finally our conclusions and future work directions are shown in Section 6.

Section snippets

Related work

One of the first studies reported in the literature to compute all the shortest reducts appears in [17]. In this study, two algorithms were introduced, one to compute all K-reducts (reducts of length less than or equal to K) and another one to compute all the shortest reducts. Both algorithms are part of the Modified Reducts Generation Algorithm (MRGA), which is the author’s main proposal. In MRGA, the application of absorption laws over the discernibility function was introduced, as a

Basic concepts

This section introduces some basic concepts which provide the theoretical basis needed to understand the proposed algorithm.

The most common way to represent the data of a supervised problem in the Rough Set Theory (RST) is through Decision Systems (DS). A Decision System is a pair T=(U,C{d}) where U is a finite non-empty set of objects U={x1,x2,,xn} called universe, C={c1,,ck} is a non-empty set of attributes, called condition attributes and d is an attribute, named decision attribute, such

Proposed algorithm

The proposed algorithm is based on the idea that if the shortest reduct size is known a priori, then the computation of all the shortest reducts is performed more efficiently, since knowing the size, the problem is reduced to find all the super-reducts of that size. Therefore, our algorithm consists of two sub-tasks: the first one is to find the size of the shortest reducts starting from the size of a short super-reduct found through a heuristic process, see SubSection 4.2, and in the second

Experiments and results

This section presents a comparative analysis of the proposed algorithm against MiLIT [22] and MinReduct [23], which are the fastest state of the art algorithms used for computing all the shortest reducts. For our experiments, we have implemented the proposed algorithm in Java, and we used the authors’ implementation of MiLIT and MinReduct also in Java.

Evaluations are performed using real-world datasets and synthetic simplified binary discernibility matrices. The real-world datasets were taken

Conclusions

In this paper, a new algorithm to compute all the shortest reducts was introduced. The proposed algorithm is based on the idea that if the shortest reducts’ size is known a priori, they can be computed more efficiently. Therefore, a new method to quickly find a short super-reduct was proposed, and then, this short super-reduct is used to determine the actual size of the shortest reducts. Additionally, a new strategy for pruning the search space to find the shortest reducts was introduced.

The

CRediT authorship contribution statement

Yanir González-Díaz: Investigation, Software, Data curation, Writing – original draft, Writing – review & editing. José Fco. Martínez-Trinidad: Conceptualization, Methodology, Writing – review & editing. Jesús A. Carrasco-Ochoa: Conceptualization, Methodology, Writing – review & editing. Manuel S. Lazo-Cortésb: Conceptualization, Methodology, Writing – review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgement

The first author gratefully acknowledges CONACYT for his doctoral fellowship.

References (31)

  • J. Bazan et al.

    Rough set algorithms in classification problems

  • L. Yu et al.

    A Rough-Set-Refined Text Mining Approach for Crude Oil Market Tendency Forecasting

    (2009)
  • J.W. Grzymala-Busse, Rough set theory with applications to data mining, in: Real World Applications of Computational...
  • J. Sil, A.K. Das, Variable Length Reduct Vs. Minimum Length Reduct-A Comparative study, Procedia Technology 4 (2012)...
  • M. Arowolo et al.

    A survey of dimension reduction and classification methods for rna-seq data on malaria vector

    J. Big Data

    (2021)
  • Cited by (0)

    View full text