Multi-granulation Strategy via Feature Subset Extraction by Using a Genetic Algorithm and a Rough Sets-Based Measure of Dependence

Rivas, Ariam; Navarro, Ricardo; Kim, Chyon Hae; Bello, Rafael

doi:10.1007/978-3-030-13469-3_24

Ariam Rivas¹⁷,
Ricardo Navarro¹⁸,
Chyon Hae Kim¹⁸ &
…
Rafael Bello¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11401))

Included in the following conference series:

Iberoamerican Congress on Pattern Recognition

1993 Accesses
1 Citations

Abstract

Rough Set Theory (RST) is an effective technique for data analysis, which aims at approximate any concept (domain subset) by a pair of exact sets called the lower and upper approximations. In this research, we develop a machine learning method based on contexts (sets of attributes). It performs a multi-granulation based on a genetic algorithm in a way that it searches for the subsets of features having best values for the measure at hand (the degree of dependence from RST), but at the same time more distinct from each other. Then, an ensemble algorithm of the models obtained in each granule is applied. The proposed Genetic Algorithm and Rough Sets-based Multi-Granulation method exhibits satisfactory results compared to outstanding state-of-the-art algorithms.

You have full access to this open access chapter, Download conference paper PDF

Feature Selection Based on Rough Set and Gravitational Search Algorithm

Feature selection based on generalized variable-precision $$(\vartheta ,\sigma )$$ -fuzzy granular rough set model over two universes

Article 21 December 2017

Information and Rough Set Theory Based Feature Selection Techniques

Keywords

1 Introduction

Granular computing (GC) [16] is a new paradigm in information processing. This term was coined to label a subset of Zadeh’s granular mathematics [17]. Among several Artificial Intelligence (AI) strategies, GC is used for knowledge discovery. When using data instead of knowledge, it is hard to work with the least amount of information possible without losing the quality of the solution. A clear example appears in problem solving via rough sets. In this case, it is of interest to define the lowest possible equivalence relation, that is, the equivalence relation that includes the minimum amount of attributes keeping the same quality with the use of approximate sets.

Rough Set Theory (RST) has proven to be effective to develop machine learning techniques [9, 10]. RST approaches based on multi-granulation (MG) start from the existence of different granulations determined by the relationships $ {\text{A}}_{ 1} ,{\text{ A}}_{ 2} , \ldots ,{\text{A}}_{\text{m}} \, \subseteq \,{\text{A}}_{\text{T}} $; which allows to have different perspectives of the data, where A_T is the set of features, and the A_i are subsets of features called contexts. In reviewed papers [6, 9] on the combination of RST and MG (RST+MG), authors introduced those contexts A_i without any clear explanation about how they were determined. In the less common way to extract contexts, each A_i is clearly identified by experts in the application domain. In this research, we tackle this problem by building the contexts, which is indeed the most used option to solve this problem. Such variant is very appropriate, especially in domains with many predictive features, and in those where the contexts are not clear. This MG-based RST approach is similar to learning multi-views [15].

Unlike single-view learning, multi-views learning introduces a function to model a particular view and optimizes all functions at a time, exploiting redundant views over the input data. In such a configuration, each view may contain some knowledge that other views do not have. Multiple views can be used to describe data exhaustively and accurately [12]. In the review of the literature on learning from multi-views, this matter was found to be closely related to other topics of machine learning, such as active learning and ensemble learning. The idea of ensemble learning can be briefly depicted as the use of multiple learning models and combine their predictions [5, 12]. In addition, co-training is one of the oldest schemes for learning multi-views [12].

In this research, we propose a method to construct each A_i, which can be seen as multi-views. It uses a genetic algorithm (GA) and a measure of dependence between features; an ensemble algorithm similar to co-training is applied. However, our method differs from co-training in that it does not use any information provided by previous classifiers. In our ensemble algorithm, after separately obtaining the models of multiple views, an average probability vote is used to make the classification, which is the specific task of machine learning that we are dealing with. Classification is aim at inferring a function Ƒ: P → Y from a labeled training data {(P₁, Y₁),…, (P_m, Y_m)} where P_i is a vector of values (the input) and Y_i is a class value (the output).

The remainder of this paper elaborates aspects on the computational methodology, in Sect. 2. Section 3 introduces our method Genetic Algorithm and Rough Sets-based Multi-Granulation (GA-RS-MG). Section 4 presents the experimental framework and results, while conclusions and future work remarks are given in Sect. 5.

2 Computational Methodology

To assess resulting contexts, we built an ensemble algorithm based on MLP classifier. It reveals the much the generation of contexts benefits machine learning methods.

2.1 Rough Set Theory

RST is an efficient tool for data mining, suitable for discovering dependencies between data, discovering patterns, estimating data significance, reducing data and so forth [2, 4, 13]. In particular, it has been remarkably applied to the field of medicine [9]. RST aims at approximate any concept X ⊆ U (subset of the domain universe) by a pair of exact sets, the lower and upper approximations. The lower approximation B_*(X) of a set X is defined as the collection of cases (objects of the universe U) whose equivalence classes [2, 13] are totally contained in X. The upper approximation B^*(X) contains only those objects of U which belong to the equivalence classes that are at least partially contained in the set, i.e. generated by the inseparability relation containing at least one object x belonging to X. They are formally described as follows:

$$ {\text{B}}_{ *} \left( {\text{X}} \right) = \left\{ {{\text{x}} \in {\text{U | B}}\left( {\text{x}} \right) \; \subseteq \;{\text{X}}} \right\} $$

(1)

$$ {\text{B}}^{ *} \left( {\text{X}} \right) = \left\{ {{\text{x}} \in {\text{U | B}}\left( {\text{x}} \right) \cap {\text{X }} \ne\upphi } \right\} $$

(2)

The classic RST works with discrete data to define the separability between objects based on the strict equality between values. When data have features of continuous domains, these are discretized in order to obtain the degree of dependence through equivalence relations; otherwise, it would be necessary to use similarity relations.

Dependence Between Contexts and Decision Features.

Discovering dependencies between attributes is a key in data analysis [2]. Let B and D be subsets of set of attributes A in the information system (U, A). B ⇒ σ D denotes that D depends on B in a degree of σ (0 ≤ σ ≤ 1). D depends partially on B in case σ < 1, whereas B ⇒ D if the degree of dependence σ = 1, i.e. D depends totally on B, which happens when all values of the features in D are uniquely determined by the values of the features in B.

$$ \upsigma\left( {{\text{B}},{\text{D}}} \right) = \frac{{\left| {{\text{POS}}_{\text{B}} \left( {\text{D}} \right)} \right|}}{{\left| {\text{U}} \right|}} $$

(3)

$$ {\text{where}}\;\,{\text{POS}}_{\text{B}} \left( {\text{D}} \right) = \bigcup\nolimits_{{{\text{X}} \in^{{{\text{U}}/{\text{D}}}} }} {{\text{B}}_{ *} \left( {\text{X}} \right)} $$

(4)

2.2 Random Forest and Multilayer Perceptron

Random Forest (RF) [3] is a general method to build a set of L tree-based classifiers. The data set of each classifier includes a subset of variables. The number of trees in the forest and the number of variables in the subset should be set a priori. The number of subsets of variables is calculated as $ {\text{F}} = \log_{2} {\text{M}} + 1 $, where M is the number of attributes of the original data set. Each tree is built to its maximum depth and no pruning procedure is applied after that. The predicted class for any given example is determined by adding the predictions of the set of decision trees through a majority vote.

On the other hand, artificial neural networks (ANNs) are mathematical tools for modeling problems. They reveal functional relationships between the data in classification tasks, pattern recognition, regressions, etc. Applied ANN consists of Multilayer Perceptron (MLP) [11], ones of the most popular ANN models. As for training, MLP uses the backpropagation (BP) learning algorithm to adapt its computation function to the needs of each particular problem.

2.3 Problem Formulation

Usually, a decision system $ {\text{DS}}\, = \,({\text{U}},{\text{ A}} \cup \left\{ {\text{d}} \right\}) $ is seen as a single set, where A is the set of predictive features and d means the decision feature [1, 6, 8, 14]. However, in a DS it is possible to define different contexts (subsets of features A_i ⊂ A) that bear a certain relationship with d. Thus, such contexts reveal distinct viewpoints on the relationships between the predictive and decisive attributes. Several decision subsystems $ {\text{DS}}_{\text{i}} = \,\left( {{\text{U}},\,{\text{A}}_{\text{i}} \cup \{ {\text{d}}\} } \right) $ can be obtained by using different contexts A_i. Those contexts can emerge from any set of predictive feature in a natural way. Let us consider a DS with information of college students, where A includes few features on their social status, others about high school grades, others regarding entrance examination, and so forth. Each of those sets of features offers a unique outlook on the student.

In real-world scenarios, it is not easy to create proper contexts A_i from predictive features. Thus, it is needed to tackle the problem of creating suitable contexts to apply machine learning methods on them. That is indeed the problem that we approach in this paper by introducing a method to build contexts to be used in classification tasks.

3 Proposed Method

Genetic Algorithm and Rough Sets-based Multi-Granulation (GA-RS-MG) is the method that we propose to carry out a multi-granulation from the features viewpoint, in order to develop context-based machine learning methods. Our method bases on a GA to automatically determine the contexts of each DS. It uses a generational model with elitist replacement: the fittest individual of the previous population survives to current one. In this specific case, chromosomes (individuals) have as many genes as predictive features exist in the DS. Chromosomes have a binary representation, where value 1 indicates that the corresponding feature is selected, and it is set to 0 if not.

We used the measure of dependence described by Eq. (3) as fitness function, where each chromosome represents a context. For each decision class D_i, B_*(D_i) is calculated by Eq. (1). The overall sum of the number of objects in each lower approximation is divided by the cardinality of the universe U, which in this case is the number of instances in the DS. In this way, we evaluate the degree of dependence of each chromosome with respect to the decision feature. In the last GA population, among the most mutually dependent individuals, the most different ones are selected.

Pseudocode in Table 1 describes the algorithmic basis of GA-RS-MG. Line 1 initializes the population: every gene value is set to 1 or 0 with a probability of 0.5. Then, the fitness of each individual is evaluated. A maximum of 100 iterations is used as stop condition for the evolutionary loop (line 2), and the population size s = 44. We set pc = 0.7 as crossover probability and pm = 0.09 as mutation probability. The election of values for all the aforementioned parameters has an empirical nature.

Table 1. Genetic Algorithm and Rough Sets-based Multi-Granulation (GA-RS-MG).

Full size table

In line 16, the best current individual replaces the worst descendant. We figure the number of contexts (line 19) according to a uniform distribution between a minimum level of 3 individuals and a maximum level of one third of the population size (s/3 individuals). We select the best individuals by the degree of dependence (line 20); we prefer those being more dependent but also more different from each other. Two given individuals (chromosomes) are distinct if they differ at any i-th position (gene). We try to obtain best contexts at once; we intend to assure that they are different regarding predictive features. Finally, selected best individuals become into output data set.

4 Results and Discussion

For experiment, we used data sets (see Table 2) from the University of California at Irvine (UCI) repository. The degree of dependence between contexts by GA-RS-MG and the decision feature satisfies the condition σ(A_i, d) ≥ 0.75. Contexts found for each single DS do not have the same number of attributes. We assessed the suitability of such contexts by applying MLP to discover knowledge on them. For MLP and RF we used the default parametric setups given in WEKA data-mining tool, version 3.8.

Table 2. Benchmark decision systems: description and generated contexts.

Full size table

To assure statistical robustness we made a 10-folds cross-validation with one run per DS. Besides, the built model of each original DS was added to a voting algorithm along with the remaining models of the corresponding contexts by GA-RS-MG. Such an ensemble algorithm has MLP as base classifier. Table 3 shows results (for weighted precision (WP) and mean absolute error (MAE) evaluation measures) achieved by the proposed ensemble method with MLP as base classifier (VoteMLP), as well as by MLP and RF. Highlighted (†) values are the overall best results for each DS. VoteMLP exhibits the one with the best results regarding WP.

Table 3. Experimental results.

Full size table

Figure 1 depicts the mean and the standard deviation of the classification algorithms, concerning both WP and MAE. VoteMLP and MLP reached the best performance. To detect significant differences in the group of methods, we applied Friedman test on results for WP and MAE. Table 4 shows the average ranking of each method. The p-values computed by the Friedman test were 0.2636 and 4.1770E−05, for the case of WP and MAE respectively. According to that, there are significant differences among the three algorithms regarding the MAE, for a level of significance α = 0.05. Consequently, in a post-hoc stage, we applied the Holm test in order to detect significant differences between all pairs of algorithms, as recommended in [7].

Table 4. Friedman test and Holm test.

Full size table

Table 4 shows the adjusted p-values by Holm test for each pair of methods in the comparison hypotheses. Results reveal significant differences between any pair of algorithms, for a level of significance α = 0.05. Regarding weighted precision, MLP is outperformed by RF, which is in turn outperformed by VoteMLP. However, none of such superiorities is statistically significant. In contrast, while comparing them in terms of mean absolute error, MLP significantly outperforms both VoteMLP and RF, and the poorest results belong to RF, also significantly outperformed by VoteMLP. Considering all the aforementioned, MLP and VoteMLP exhibit the best performance.

5 Conclusions

We propose Genetic Algorithm and Rough Sets-based Multi-Granulation, which creates granules (contexts) based on a GA. Each context must fulfill a certain degree of dependence with respect to the decision feature. Besides, obtained models are simpler and more precise as a whole. We consider models of contexts and the original DS for classification by an ensemble. The proposed method shows suitable results, statistically assessed by comparing the performance of MLP, VoteMLP and RF classifiers.

VoteMLP was superior to both MLP and RF in terms of weighted precision, while RF outperformed MLP, but in any case significantly. In addition, regarding the mean absolute error, VoteMLP significantly outperforms RF, and MLP outperforms them both. The results of the proposed approach are comparable with RF. Even in the case when RF performs better, our method has an extra advantage: RF uses 100 trees to find a solution but our method creates as many trees as built contexts (see Table 2). That is a good point to assess not only its effectiveness but also its efficiency.

References

Azuraliza, A.B., Zulaiha, A.O., Abdul, R.H., Rozianiwati, Y., Ruhaizan, I.: An agent based rough classifier for data mining. In: 8th International Conference on Intelligent Systems Design and Applications, pp. 145–151 (2008)
Google Scholar
Bello, R., García, M., Pérez, J.N.: Teoría de los conjuntos aproximados. Conceptos y métodos computacionales (Rough set theory. Foundations and computational methods). Editorial UD, Colombia (2012)
Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article Google Scholar
Chen, Y.-S., Cheng, C.-H.: A Delphi-based rough sets fusion model for extracting payment rules of vehicle license tax in the government sector. Expert Syst. Appl. 37, 2161–2174 (2010)
Article Google Scholar
Dietterichl, T.G.: Ensemble learning. In: Arbib, M.A. (ed.) The Handbook of Brain Theory and Neural Networks, pp. 405–408. MIT Press, Cambridge (2002)
Google Scholar
Eissa, M.M., Elmogy, M., Hashem, M.: Rough-granular neural network model for making treatment decisions of hepatitis C. In: 9th International Conference on Informatics and Systems, Data Engineering and Knowledge Management Track, pp. 19–26 (2014)
Google Scholar
García, S., Herrera, F.: An extension on “statistical comparisons of classifiers over multiple data sets” for all pairwise comparisons. J. Mach. Learn. Res. 9, 2677–2694 (2008)
MATH Google Scholar
Qian, Y., Liang, X., Lin, G., Guo, Q., Liang, J.: Local multigranulation decision-theoretic rough sets. Int. J. Approximate Reasoning 82, 119–137 (2017)
Article MathSciNet Google Scholar
Kumar, S.S., Inbarani, H.H.: Optimistic multi-granulation rough set based classification for medical diagnosis. Procedia Comput. Sci. 47, 374–382 (2015)
Article Google Scholar
Li, J., Ren, Y., Mei, C., Qian, Y., Yang, X.: A comparative study of multigranulation rough sets and concept lattices via rule acquisition. Knowl. Based Syst. 91, 152–164 (2016)
Article Google Scholar
Haykin, S.: Neural Networks and Learning Machines, 3rd edn. Prentice Hall, Pearson Education, Inc., New Jersey (2009)
Google Scholar
Nigam, K., Ghani, R.: Analyzing the effectiveness and applicability of co-training. In: 9th International Conference on Information and Knowledge Management, pp. 86–93 (2000)
Google Scholar
Polkowski, L.: Rough Sets Mathematical Foundations. Physica-Verlag, Heidelberg (2003)
MATH Google Scholar
Sikder, I.U., Munakata, T.: Application of rough set and decision tree for characterization of premonitory factors of low seismic activity. Expert Syst. Appl. 36, 102–110 (2009)
Article Google Scholar
Xu, C., Tao, D., Xu, C.: A Survey on Multi-view Learning. Cornell University Library, arXiv:1304.5634 [cs.LG] (2013)
Yao, Y.Y.: Granular computing: basic issues and possible solutions. In: 5th Joint Conference on Information Sciences, pp. 186–189 (2000)
Google Scholar
Zadeh, L.A.: Some reflections on soft computing, granular computing and their roles in the conception, design and utilization of information/intelligent systems. Soft. Comput. 2, 23–25 (1998)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Universidad de Holguín, Holguín, Cuba
Ariam Rivas
Iwate University, Morioka, Japan
Ricardo Navarro & Chyon Hae Kim
Universidad Central de Las Villas, Santa Clara, Cuba
Rafael Bello

Authors

Ariam Rivas
View author publications
You can also search for this author in PubMed Google Scholar
Ricardo Navarro
View author publications
You can also search for this author in PubMed Google Scholar
Chyon Hae Kim
View author publications
You can also search for this author in PubMed Google Scholar
Rafael Bello
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Ariam Rivas or Ricardo Navarro .

Editor information

Editors and Affiliations

Biometrics and Data Pattern Analytics Lab, Universidad Autonoma de Madrid, Madrid, Spain
Ruben Vera-Rodriguez
Biometrics and Data Pattern Analytics Lab, Universidad Autonoma de Madrid, Madrid, Spain
Julian Fierrez
Biometrics and Data Pattern Analytics Lab, Universidad Autonoma de Madrid, Madrid, Spain
Aythami Morales

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rivas, A., Navarro, R., Kim, C.H., Bello, R. (2019). Multi-granulation Strategy via Feature Subset Extraction by Using a Genetic Algorithm and a Rough Sets-Based Measure of Dependence. In: Vera-Rodriguez, R., Fierrez, J., Morales, A. (eds) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. CIARP 2018. Lecture Notes in Computer Science(), vol 11401. Springer, Cham. https://doi.org/10.1007/978-3-030-13469-3_24

Download citation

DOI: https://doi.org/10.1007/978-3-030-13469-3_24
Published: 03 March 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-13468-6
Online ISBN: 978-3-030-13469-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Multi-granulation Strategy via Feature Subset Extraction by Using a Genetic Algorithm and a Rough Sets-Based Measure of Dependence

Abstract

Similar content being viewed by others

Feature Selection Based on Rough Set and Gravitational Search Algorithm

Feature selection based on generalized variable-precision $$(\vartheta ,\sigma )$$ -fuzzy granular rough set model over two universes

Information and Rough Set Theory Based Feature Selection Techniques

Keywords

1 Introduction

2 Computational Methodology

2.1 Rough Set Theory

Dependence Between Contexts and Decision Features.

2.2 Random Forest and Multilayer Perceptron

2.3 Problem Formulation

3 Proposed Method

4 Results and Discussion

5 Conclusions

References

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

Multi-granulation Strategy via Feature Subset Extraction by Using a Genetic Algorithm and a Rough Sets-Based Measure of Dependence

Abstract

Similar content being viewed by others

Feature Selection Based on Rough Set and Gravitational Search Algorithm

Feature selection based on generalized variable-precision $$(\vartheta ,\sigma )$$ -fuzzy granular rough set model over two universes

Information and Rough Set Theory Based Feature Selection Techniques

Keywords

1 Introduction

2 Computational Methodology

2.1 Rough Set Theory

Dependence Between Contexts and Decision Features.

2.2 Random Forest and Multilayer Perceptron

2.3 Problem Formulation

3 Proposed Method

4 Results and Discussion

5 Conclusions

References

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation