mRMR+: An Effective Feature Selection Algorithm for Classification

Chowdhury, Hussain A.; Bhattacharyya, Dhruba K.

doi:10.1007/978-3-319-69900-4_54

mRMR+: An Effective Feature Selection Algorithm for Classification

Conference paper
First Online: 01 November 2017

2930 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10597))

Abstract

This paper presents an empirical study using three entropy measures such as Shannon’s entropy, Renyi’s entropy, and Tsallis entropy, while calculating mutual information to select top ranked features. We evaluate the selected features using three established classifiers such as naive Bayes, IBK and Random Forest in terms of classification accuracy on five gene expression datasets. We observe that none gives consistent performance in ordering the features based on their rank. To address this issue, we propose a variant of mRMR, using ensemble approach based on our own weight function. The results establish that our method is significantly superior than its other counterparts in terms of feature selection and classification accuracy in most of the datasets.

You have full access to this open access chapter, Download conference paper PDF

1 Introduction

A feature is an individual measurable property of a phenomenon being observed. The representation of raw input data uses many features, only some of which are relevant to the class. Feature selection for supervised classification can be accomplished on the basis of entropy information between features and classes. We use Shannon Entropy [10], Renyi’s and Tsallis Entropy [6] to calculate mutual information [3] between feature and class or feature-feature in this work. It is found that mRMR [8] is a practical and superior algorithm for feature selection and classification, however it does not perform well if lesser number of attributes present in datasets [8]. The main motivation behind our work is to develop an enhanced feature selection algorithm that performs consistently well in all kinds of datasets. An ensemble method for entropy-based feature selection is developed and evaluated using common machine learning algorithms on a variety of UCI gene expression datasets. We carry out comparative study among existing entropy-based feature selection methods. Our method eliminates irrelevant and redundant data and in majority cases it improves the performance of learning algorithms.

2 Related Work

In the past two decades, a good number of MI-based feature selection algorithms have been introduced. Two main important aspects of feature selection are: (i) minimum redundancy in terms of number of features and (ii) maximum relevance of a feature with a given class label. Some well-known MI-based feature selection algorithms are: Information Gain [1], Gain Ratio [3], mRMR [8] and its variant [9]. InfoGain and GainRatio select features based on relevancy only, however other mentioned MI-based feature selection algorithms select most relevant and least redundant features. From our study, we observe that mRMR is appropriate for large number of applications having large numbers of features [8]. It performs well on both continuous and discrete data.

To achieve minimum redundancy - maximum relevance for categorical variables [5], most researchers consider that if the feature values are uniformly distributed in different classes, its mutual information with these classes is zero. If a feature is highly differentially expressed for different classes, it should have large mutual information. Thus, mutual information can be considered as a measure to estimate relevance of features.

The mRMR algorithm aims to select a feature set S, which shows maximum relevance to a given class (features provide maximum information about the class) and are of less redundant. mRMR considers the mutual information of each feature against the classes, but also subtracts the redundancy of each feature with the already selected ones. mRMR follows filter criterion based on mutual information estimation. Instead of estimating the mutual information between a whole set of features and the class labels, the authors estimate it for each one of the selected features separately. On one hand, they maximize the relevance $I(x_j; C)$ of each already selected individual feature and on the other hand they minimize the redundancy between $x_j$ and the rest of selected features. This criterion can be expressed for selection of $m^{th}$ feature is:

$$\begin{aligned} max_{x_j \in X-{S_{m-1}}}[I(x_j;C)- \frac{1}{m-1} \sum _{{x_i} \in S_{m-1}}I{(x_j;x_i)}]. \end{aligned}$$

(1)

This criterion can be used by a greedy algorithm, which in each iteration takes a single feature and decides whether to add it to the selected feature set, or to discard it, and this process is repeated till required set S of K optimal features is obtained. This implies that the $m^{th}$ feature $x_m$ will be selected only when a set of $(m-1)$ features i.e., $S_{m-1}$ exits. We refer to the original mRMR method as $mRMR_{MI}$. Another variant of the mRMR criterion [9] also exists (referred here as $mRMR_{GR}$). In [9], it is reformulated using a different representation of redundancy. The authors propose to use a coefficient of uncertainty which consists of dividing the MI value between two features $x_j$ and $x_i$ by the entropy of $H(x_i)$, where $x_i \in S_{m-1}$. The equation is as given below.

$$\begin{aligned} max_{x_j \in X- {S_{m-1}}}[I(x_j;C)- \frac{1}{m-1} \sum _{{x_i} \in S_{m-1}}\frac{I(x_j;x_i)}{H(x_i)}] \end{aligned}$$

(2)

In this study, we use these two variants of mRMR algorithms in our experiments, analyze their pros and cons, and introduce another variant of mRMR which is an effective combination of above two variants.

3 $mRMR+:$ The Proposed Ensemble mRMR Algorithms

We carried out an exhaustive experimental study using the two variants of mRMR on benchmark datasets. Our observation is that the MI-based mRMR (i.e. $mRMR_{MI}$) does not perform well if the number of attributes in the dataset are less [8]. However, the second variant of mRMR can eliminate this disadvantage of $mRMR_{MI}$. But, we have found from our exhaustive experimentation that most of the time $mRMR_{GR}$ also performs poorly if the number of attributes is higher in the datasets.

Our proposal i.e., $mRMR+$ is an effective combination of the above two mRMR variants through a weight function. We performed an exhaustive experimentation to determine the proper weight function dynamically. We experimentally found that in most of the cases, if MI value between two variables is more than GR (gain ratio) than $mRMR_{MI}$ does not perform well. To eliminate this problem, we combine these two variants of mRMR (Eqs. 1 and 2) in such a way that the combination function performs consistently well for any number of variables. Our method performs well in almost all datasets than above discussed variants of mRMR. We perform a comparative analysis among all the three variants of mRMR using aforesaid three entropy measures. The proposed formulation of $mRMR+$ for the selection of the $m^{th}$ feature is as follows:

$$\begin{aligned} max_{x_j \in X-{S_{m-1}}}[I(x_j;C)-(\frac{l}{m-1}\sum _{{x_i}\in S_{m-1}}I{(x_j;x_i)}+\frac{1-l}{m-1} \sum _{{x_i} \in S_{m-1}}\frac{I(x_j;x_i)}{H(x_i)})]. \end{aligned}$$

(3)

Our method takes gene expression dataset as input and apply a discretization in preprocess step to eliminate noises from data. The value of weight function l is computed before finding out the top relevant feature based on MI value between feature and class. After that, using Eq. 3 we find out least redundant and maximum relevant feature from the remaining features and add one feature at a time to the selected feature list till requires K optimal features are selected.

Our proposed weight function take gene expression data as input and calculate $m=Max(MI(x_{i},C)) $ and $n=Max(GR(\frac{MI(x_i,C)}{H(C)}))$. If it is observed that ${m\ge n}$ then in our method weight (l) is calculated as $l=1-\frac{n}{m}$ else weight (l) is calculated as $l=\frac{m}{n}$. To select the $m^{th}$ feature, the computational complexity of this incremental search is O(|S|.M) where M is the number of attributes in the dataset, which is similar to the MI-based mRMR algorithm [8].

Table 1. Dataset and accuracy (the 10-fold cross validation) classifiers

Full size table

4 Experimental Results

To evaluate the usefulness of the different variants of mRMR algorithm and different entropy measures, five UCI machine learning datasets of gene expression profiles having classes $\ge 2$ were chosen and presented in Table 1. The accuracy of 10-fold cross validation of classification methods for all features are reported in Table 1 using three different classification methods viz., Naive Bayes (NB) [7], Random Forest (RF) [4] and IBK [2]. Generally, RF performed better than other classification methods due to its suitability for high dimensional data. To discretize the datasets, we use same discretization technique as used by the two mentioned variants of mRMR. Due to the space constrains we are unable to present detail results. Figure 1(a) presents classification accuracy of NB classifier on lung cancer dataset in forward direction. The average classification accuracies of Shanon, Renyi’s and Tsallis entropy based MIs are 85.31%, 77.50%, 82.50%, respectively. Whereas, average classification accuracy of Shannon entropy based mRMR variants is 88.12% where our proposed method provides average 88.44% classification accuracy. Figure 1(b) reports results on colon tumor dataset based on NB classifier when we select top ranked features in forward direction. The average classification accuracies of Shanon, Renyi’s and Tsallis entropy based MIs are 86.94%, 86.61%, 86.61%, respectively. Shannon entropy based mRMR dominates other entropy based mRMR results. We found average classification accuracy of 88.87% in case of Shannon Entropy based mRMR variants where our proposed method provides average 89.52% classification accuracy. Table 2 reports average classification accuracy for the top ten selected features using NB classifier of different entropy based $mRMR_{MI}$. We found that Shannon entropy based mRMR variants always dominate other entropy based mRMR variants. So, in remaining experimental results we only consider Shannon entropy based mRMR variants. Figure 1(c) reports classification accuracy of NB classifier on breast cancer dataset and the average classification accuracies of Shanon, Renyi’s and Tsallis entropy based MIs for this dataset are same (i.e., 95.29%). In case of Shannon Entropy based mRMR variants, we observe that average classification accuracy is 95.81% where our method provides average 95.85% classification accuracy. Figure 1(d) presents classification accuracy of NB classifier on breast cancer dataset and average classification accuracy of Shanon, Renyi’s and Tsallis entropy based MIs for this dataset are 90.37%, 90.467%, 89.90%, respectively. In case of Shannon entropy based mRMR, we found average classification accuracy is 90.97% among the three mRMR variants where our method provides average 91.03% classification accuracy. Figure 1(e), (f), (g) report results on NCI dataset using NB, IBK and RF classifier respectively. The average classification accuracy of Shanon, Renyi’s and Tsallis entropy based MIs for NCI dataset are 53.33%, 56.50%, 55.00%, respectively. $mRMR+$ shows higher average classification accuracies for NB, IBK and RF classifier.

Table 2. Average classification accuracy of NB classifier in % for top 10 features

Full size table

Finally, we present the effectiveness of our method in Table 2. Table 2 shows that Shannon entropy based $mRMR_{MI}$ performs well in four out of five datasets than Renyi’s entropy based $mRMR_{MI}$ and in all datasets than Tsallis entropy based $mRMR_{MI}$. On the other hand, our method $mRMR+$ based on Shannon entropy consistently performed well in all datasets in every aspect of our analysis.

5 Conclusions

Our method, referred to as $mRMR+$, performs significantly well in comparison to its competing mRMR and its variant algorithm over five benchmark datasets. Our study also includes an exhaustive empirical study on three well known entropy measures, while selecting relevant and non-redundant features to achieve best possible classification accuracy.

References

Abusamra, H.: A comparative study of feature selection and classification methods for gene expression data. Ph.d. thesis, King Abdullah University of Science and Technology (2013)
Google Scholar
Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Mach. Learn. 6(1), 37–66 (1991)
Google Scholar
Bonev, B.: Feature selection based on information theory. Ph.d. thesis, University of Alicante, June 2010
Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article MATH Google Scholar
Ding, C., Peng, H.: Minimum redundancy feature selection from microarray gene expression data. J. Bioinform. Comput. Biol. 3(2), 185–205 (2005)
Article Google Scholar
Maszczyk, T., Duch, W.: Comparison of Shannon’s, Renyi’s and Tsallis entropy used in decision trees. In: Rutkowski, L., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2008. LNCS, vol. 5097, pp. 643–651. Springer, Heidelberg (2008). doi:10.1007/978-3-540-69731-2_62
Google Scholar
Murphy, K.P.: Naive Bayes classifiers. University of British Columbia (2006)
Google Scholar
Peng, H., Long, F., Ding, C.: Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005)
Article Google Scholar
Ponsa, D., López, A.: Feature selection based on a new formulation of the minimal-redundancy-maximal-relevance criterion. In: Martí, J., Benedí, J.M., Mendonça, A.M., Serrat, J. (eds.) IbPRIA 2007, Part I. LNCS, vol. 4477, pp. 47–54. Springer, Heidelberg (2007). doi:10.1007/978-3-540-72847-4_8
Chapter Google Scholar
Shannon, C.E.: A mathematical theory of communication. ACM SIGMOBILE Mob. Comput. Commun. Rev. 5(1), 3–55 (2001)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Tezpur University, Sonitpur, 784028, Assam, India
Hussain A. Chowdhury & Dhruba K. Bhattacharyya

Authors

Hussain A. Chowdhury
View author publications
You can also search for this author in PubMed Google Scholar
Dhruba K. Bhattacharyya
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dhruba K. Bhattacharyya .

Editor information

Editors and Affiliations

Indian Statistical Institute, Kolkata, India
B. Uma Shankar
Indian Statistical Institute, Kolkata, India
Kuntal Ghosh
Indian Statistical Institute, Kolkata, India
Deba Prasad Mandal
Indian Statistical Institute, Kolkata, India
Shubhra Sankar Ray
The Hong Kong Polytechnic University, Hong Kong, China
David Zhang
Indian Statistical Institute, Kolkata, India
Sankar K. Pal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chowdhury, H.A., Bhattacharyya, D.K. (2017). mRMR+: An Effective Feature Selection Algorithm for Classification. In: Shankar, B., Ghosh, K., Mandal, D., Ray, S., Zhang, D., Pal, S. (eds) Pattern Recognition and Machine Intelligence. PReMI 2017. Lecture Notes in Computer Science(), vol 10597. Springer, Cham. https://doi.org/10.1007/978-3-319-69900-4_54

Download citation

DOI: https://doi.org/10.1007/978-3-319-69900-4_54
Published: 01 November 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-69899-1
Online ISBN: 978-3-319-69900-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

mRMR+: An Effective Feature Selection Algorithm for Classification

Abstract

1 Introduction

2 Related Work

3 \(mRMR+:\) The Proposed Ensemble mRMR Algorithms

4 Experimental Results

5 Conclusions

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

Abstract

1 Introduction

2 Related Work

3 \(mRMR+:\) The Proposed Ensemble mRMR Algorithms

4 Experimental Results

5 Conclusions

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation