Algorithm for computing all the shortest reducts based on a new pruning strategy
Introduction
In Rough Set Theory (RST) [1], supervised classification problems are represented through Decision Systems. A Decision System is a table in which the rows and columns represent objects and attributes, respectively, of a data set. On the other hand, the minimal subsets of attributes that preserve the capability of the whole set of features to discern objects from different classes in a decision system are called reducts.
The problem of computing all the reducts of a decision system has exponential complexity regarding the number of attributes in the decision system [2]. Thus, several approximate algorithms for reduct computation have been developed using heuristics based on: the importance of attributes [3], mutual information [4], and genetic algorithms [5], [6], among others.
The main drawback of these algorithms is that they do not necessarily return the whole set of reducts of a decision system. Even they sometimes obtain super-reducts, i.e. subsets not necessarily irreducible. Additionally, for some applications, not all the reducts are needed [7], [8]; and some studies have evaluated the feasibility of computing only the shortest reducts, instead of computing all of them [9].
Among other areas, the shortest reducts can be used in dimensionality reduction [10], [11], [12], feature selection [13], [14], and building classifiers based on rules [15], [16]. In the worst case, the computation of all the shortest reducts has the same exponential complexity as computing all the reducts. However, in practice, computing only the shortest reducts is generally much faster than computing the whole set of reducts. Thus, in practical applications such as those mentioned above, computing the shortest reducts may be preferred over computing all the reducts.
In this paper, we propose an algorithm for computing all the shortest reducts in a decision system. In the development of this new algorithm, some of the most effective search space pruning methods used in state-of-the-art algorithms to compute all reducts are used, and some new pruning methods are introduced. Our new algorithm is compared against state-of-the-art algorithms, using synthetic and real-world datasets.
The rest of this paper is structured as follows: in Section 2, we present the related work; Section 3 contains the theoretic background; Section 4 details our proposal; in Section 5, we show and discuss the experimental results; and finally our conclusions and future work directions are shown in Section 6.
Section snippets
Related work
One of the first studies reported in the literature to compute all the shortest reducts appears in [17]. In this study, two algorithms were introduced, one to compute all K-reducts (reducts of length less than or equal to K) and another one to compute all the shortest reducts. Both algorithms are part of the Modified Reducts Generation Algorithm (MRGA), which is the author’s main proposal. In MRGA, the application of absorption laws over the discernibility function was introduced, as a
Basic concepts
This section introduces some basic concepts which provide the theoretical basis needed to understand the proposed algorithm.
The most common way to represent the data of a supervised problem in the Rough Set Theory (RST) is through Decision Systems (DS). A Decision System is a pair where U is a finite non-empty set of objects called universe, is a non-empty set of attributes, called condition attributes and d is an attribute, named decision attribute, such
Proposed algorithm
The proposed algorithm is based on the idea that if the shortest reduct size is known a priori, then the computation of all the shortest reducts is performed more efficiently, since knowing the size, the problem is reduced to find all the super-reducts of that size. Therefore, our algorithm consists of two sub-tasks: the first one is to find the size of the shortest reducts starting from the size of a short super-reduct found through a heuristic process, see SubSection 4.2, and in the second
Experiments and results
This section presents a comparative analysis of the proposed algorithm against MiLIT [22] and MinReduct [23], which are the fastest state of the art algorithms used for computing all the shortest reducts. For our experiments, we have implemented the proposed algorithm in Java, and we used the authors’ implementation of MiLIT and MinReduct also in Java.
Evaluations are performed using real-world datasets and synthetic simplified binary discernibility matrices. The real-world datasets were taken
Conclusions
In this paper, a new algorithm to compute all the shortest reducts was introduced. The proposed algorithm is based on the idea that if the shortest reducts’ size is known a priori, they can be computed more efficiently. Therefore, a new method to quickly find a short super-reduct was proposed, and then, this short super-reduct is used to determine the actual size of the shortest reducts. Additionally, a new strategy for pruning the search space to find the shortest reducts was introduced.
The
CRediT authorship contribution statement
Yanir González-Díaz: Investigation, Software, Data curation, Writing – original draft, Writing – review & editing. José Fco. Martínez-Trinidad: Conceptualization, Methodology, Writing – review & editing. Jesús A. Carrasco-Ochoa: Conceptualization, Methodology, Writing – review & editing. Manuel S. Lazo-Cortésb: Conceptualization, Methodology, Writing – review & editing.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgement
The first author gratefully acknowledges CONACYT for his doctoral fellowship.
References (31)
- et al.
A novel dimension reduction and dictionary learning framework for high-dimensional data classification
Pattern Recogn.
(2021) - et al.
An improved runner-root algorithm for solving feature selection problems based on rough sets and neighborhood rough sets
Appl. Soft Comput.
(2020) - et al.
Earc: Evidential association rule-based classification
Inf. Sci.
(2021) - et al.
Minreduct: A new algorithm for computing the shortest reducts
Pattern Recogn. Lett.
(2020) - et al.
Discernibility matrix simplification for constructing attribute reducts
Inf. Sci.
(2009) Rough sets
Int. J. Comput. Inf. Sci.
(1982)- et al.
The discernibility matrices and functions in information systems
Intelligent Decision Support-Handbook of Applications and Advances of the Rough Sets Theory, Knowledge Engineering and Problem Solving
(1992) - X. Hu, Knowledge discovery in databases: an attribute-oriented rough set approach (Ph.D. thesis), University of Regina...
- G.Y. Wang, J. Zhao, J.J. An, Y. Wu, Theoretical study on attribute reduction of rough set theory: comparison of algebra...
- J. Wroblewski, Finding minimal reducts using genetic algorithms, in: Proccedings of the second annual join conference...