M3U: Minimum Mean Minimum Uncertainty Feature Selection for Multiclass Classification

Zhang, Zisheng; Parhi, Keshab K.

doi:10.1007/s11265-019-1443-6

M3U: Minimum Mean Minimum Uncertainty Feature Selection for Multiclass Classification

Published: 21 February 2019

Volume 92, pages 9–22, (2020)
Cite this article

Journal of Signal Processing Systems Aims and scope Submit manuscript

288 Accesses
3 Citations
Explore all metrics

Abstract

This paper presents a novel multiclass feature selection algorithm based on weighted conditional entropy, also referred to as uncertainty. The goal of the proposed algorithm is to select a feature subset such that, for each feature sample, there exists a feature that has a low uncertainty score in the selected feature subset. Features are first quantized into different bins. The proposed feature selection method first computes an uncertainty vector from weighted conditional entropy. Lower the uncertainty score for a class, better is the separability of the samples in that class. Next, an iterative feature selection method selects a feature in each iteration by (1) computing the minimum uncertainty score for each feature sample for all possible feature subset candidates, (2) computing the average minimum uncertainty score across all feature samples, and (3) selecting the feature that achieves the minimum of the mean of the minimum uncertainty score. The experimental results show that the proposed algorithm outperforms mRMR and achieves lower misclassification rates using various types of publicly available datasets. In most cases, the number of features necessary for a specified misclassification error is less than that required by traditional methods. For all datasets, the misclassification error is reduced by 5∼25% on average, compared to a traditional method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Figure 1

Feature subset selection combining maximal information entropy and maximal information coefficient

Article 29 July 2019

Wide-ranging approach-based feature selection for classification

Article 17 November 2022

Constrained class-wise feature selection (CCFS)

Article 20 June 2022

References

Allwein, E.L., Schapire, R.E., Singer, Y. (2000). Reducing multiclass to binary: a unifying approach for margin classifiers. Journal of Machine Learning Research, 1(Dec), 113–141.
MathSciNet MATH Google Scholar
Anguita, D., Ghio, A., Oneto, L., Parra, X., Reyes-Ortiz, J.L. (2013). A public domain dataset for human activity recognition using smartphones. In ESANN.
Battiti, R. (1994). Using mutual information for selecting features in supervised neural net learning. IEEE Transactions on Neural Networks, 5(4), 537–550.
Article Google Scholar
Bermingham, M.L., Pong-Wong, R., Spiliopoulou, A., Hayward, C., Rudan, I., Campbell, H., Wright, A.F., Wilson, J.F., Agakov, F., Navarro, P., Haley, C.S. (2015). Application of high-dimensional feature selection: evaluation for genomic prediction in man. Scientific Reports, 5, 10312.
Article Google Scholar
Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A. (1984). Classification and regression trees. Boca Raton: CRC Press.
MATH Google Scholar
Brown, G. (2009). A new perspective for information theoretic feature selection. In AISTATS, pp. 49–56.
Chen, Y.W., & Lin, C.J. (2006). Combining SVMs with various feature selection strategies. Feature Extraction, 207, 315–324.
Article Google Scholar
Cover, T.M., & Thomas, J.A. (2012). Elements of information theory. New York: Wiley.
MATH Google Scholar
Dash, M., & Liu, H. (2003). Consistency-based search in feature selection. Artificial Intelligence, 151(1), 155–176.
Article MathSciNet Google Scholar
Devijver, P.A., & Kittler, J. (1982). Pattern recognition: a statistical approach. New Jersey: Prentice hall.
MATH Google Scholar
Dietterich, T.G., & Bakiri, G. (1995). Solving multiclass learning problems via error-correcting output codes. Journal of Artificial Intelligence Research, 2, 263–286.
Article Google Scholar
Fayyad, U., Piatetsky-Shapiro, G., Smyth, P. (1996). From data mining to knowledge discovery in databases. AI Magazine, 17(3), 37.
Google Scholar
Fleuret, F. (2004). Fast binary feature selection with conditional mutual information. Journal of Machine Learning Research, 5(Nov), 1531–1555.
MathSciNet MATH Google Scholar
Forman, G. (2003). An extensive empirical study of feature selection metrics for text classification. Journal of Machine Learning Research, 3(Mar), 1289–1305.
MATH Google Scholar
Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 3(Mar), 1157–1182.
MATH Google Scholar
Guyon, I., Weston, J., Barnhill, S., Vapnik, V. (2002). Gene selection for cancer classification using support vector machines. Machine Learning, 46(1-3), 389–422.
Article Google Scholar
Hall, M.A. (2000). Correlation-based feature selection of discrete and numeric class machine learning. In Proceedings of the 17th international conference on machine learning, pp. 359–366.
Henze, N., & Penrose, M.D. (1999). On the multivariate runs test. Annals of Statistics, pp. 290–298.
Hou, Y., Zhang, P., Yan, T., Li, W., Song, D. (2010). Beyond redundancies: a metric-invariant method for unsupervised feature selection. IEEE Transactions on Knowledge and Data Engineering, 22(3), 348–364.
Article Google Scholar
Huang, N.E., Shen, Z., Long, S.R., Wu, M.C., Shih, H.H., Zheng, Q., Yen, N.C., Tung, C.C., Liu, H.H. (1998). The empirical mode decomposition and the hilbert spectrum for nonlinear and non-stationary time series analysis. In Proceedings of the Royal Society of London a: mathematical, physical and engineering sciences, vol. 454, pp. 903–995. The Royal Society.
Article MathSciNet Google Scholar
James, G., Witten, D., Hastie, T., Tibshirani, R. (2013). An introduction to statistical learning, vol. 6. Berlin: Springer.
Book Google Scholar
Jolliffe, I. (2002). Principal component analysis. Wiley Online Library.
Kariwala, V., Ye, L., Cao, Y. (2013). Branch and bound method for regression-based controlled variable selection. Computers and Chemical Engineering, 54, 1–7.
Article Google Scholar
Kohavi, R., & John, G.H. (1997). Wrappers for feature subset selection. Artificial Intelligence, 97(1), 273–324.
Article Google Scholar
Kwak, N., & Choi, C.H. (2002). Input feature selection for classification problems. IEEE Transactions on Neural Networks, 13(1), 143–159.
Article Google Scholar
Langley, P. (1994). Selection of relevant features in machine learning. In Proceedings of the AAAI fall symposium on relevance, vol. 184, pp. 245–271.
Lichman, M. (2013). UCI machine learning repository. http://archive.ics.uci.edu/ml.
Lilliefors, H.W. (1967). On the Kolmogorov-Smirnov test for normality with mean and variance unknown. Journal of the American Statistical Association, 62(318), 399–402.
Article Google Scholar
Lin, D., & Tang, X. (2006). Conditional infomax learning: an integrated framework for feature extraction and fusion. In European conference on computer vision, pp. 68–82. Springer.
Liu, H., & Motoda, H. (2012). Feature selection for knowledge discovery and data mining, vol. 454. Berlin: Springer.
Google Scholar
Liu, H., & Yu, L. (2005). Toward integrating feature selection algorithms for classification and clustering. IEEE Transactions on Knowledge and Data Engineering, 17(4), 491–502.
Article Google Scholar
Maji, P., & Pal, S.K. (2010). Feature selection using f-information measures in fuzzy approximation spaces. IEEE Transactions on Knowledge and Data Engineering, 22(6), 854–867.
Article Google Scholar
Otto. (2014). Otto group product classification challenge. https://www.kaggle.com/.
Paschke, F., Bayer, C., Bator, M., Mönks, U., Dicks, A., Enge-Rosenblatt, O., Lohweg, V. (2013). Sensorlose zustandsüberwachung an synchronmotoren. In Proceedings. 23. Workshop computational intelligence, dortmund, 5.-6. December 2013, p. 211. KIT Scientific Publishing.
Peng, H., Long, F., Ding, C. (2005). Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(8), 1226–1238.
Article Google Scholar
Pudil, P., Novovičová, J., Kittler, J. (1994). Floating search methods in feature selection. Pattern Recognition Letters, 15(11), 1119–1125.
Article Google Scholar
Qu, G., Hariri, S., Yousif, M. (2005). A new dependency and correlation analysis for features. IEEE Transactions on Knowledge and Data Engineering, 17(9), 1199–1207.
Article Google Scholar
Reyes-Ortiz, J.L., Oneto, L., Ghio, A., Samá, A., Anguita, D., Parra, X. (2014). Human activity recognition on smartphones with awareness of basic activities and postural transitions. In International conference on artificial neural networks, pp. 177–184. Springer.
Reyes-Ortiz, J.L., Oneto, L., Samà, A., Parra, X., Anguita, D. (2016). Transition-aware human activity recognition using smartphones. Neurocomputing, 171, 754–767.
Article Google Scholar
Sayood, K. (2012). Introduction to data compression. Burlington: Morgan Kaufmann.
MATH Google Scholar
Siedlecki, W., & Sklansky, J. (1988). On automatic feature selection. International Journal of Pattern Recognition and Artificial Intelligence, 2(02), 197–220.
Article Google Scholar
Somol, P., Pudil, P., Kittler, J. (2004). Fast branch & bound algorithms for optimal feature selection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(7), 900–912.
Article Google Scholar
Theodoridis, S., & Koutroumbas, K. (2008). Pattern recognition. Cambridge: Academic Press.
MATH Google Scholar
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 58(1), 267–288.
Article MathSciNet Google Scholar
UCI. (2014). Forest cover type prediction. https://www.kaggle.com/.
Vergara, J.R., & Estévez, P.A. (2014). A review of feature selection methods based on mutual information. Neural Computing and Applications, 24(1), 175–186.
Article Google Scholar
Vidal-Naquet, M., & Ullman, S. (2003). Object recognition with informative features and linear classification. In ICCV, vol. 3, p. 281.
Wang, D., Nie, F., Huang, H. (2015). Feature selection via global redundancy minimization. IEEE Transactions on Knowledge and Data Engineering, 27(10), 2743–2755.
Article Google Scholar
Weston, J., Elisseeff, A., Schölkopf, B., Tipping, M. (2003). Use of the zero-norm with linear models and kernel methods. Journal of Machine Learning Research, 3(Mar), 1439–1461.
MathSciNet MATH Google Scholar
Weston, J., Mukherjee, S., Chapelle, O., Pontil, M., Poggio, T., Vapnik, V. (2000). Feature selection for SVMs advances in neural information processing systems.
Yang, H.H., & Moody, J.E. (1999). Data visualization and feature selection: new algorithms for nongaussian data. In NIPS, vol. 99, pp. 687–693. Citeseer.
Yang, S.H., & Hu, B.G. (2012). Discriminative feature selection by nonparametric bayes error minimization. IEEE Transactions on Knowledge and Data Engineering, 24(8), 1422–1434.
Article Google Scholar
Yu, L., & Liu, H. (2004). Efficient feature selection via analysis of relevance and redundancy. Journal of Machine Learning Research, 5(Oct), 1205–1224.
MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, University of Minnesota, Minneapolis, MN, 55455, USA
Zisheng Zhang & Keshab K. Parhi

Authors

Zisheng Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Keshab K. Parhi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Keshab K. Parhi.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, Z., Parhi, K.K. M3U: Minimum Mean Minimum Uncertainty Feature Selection for Multiclass Classification. J Sign Process Syst 92, 9–22 (2020). https://doi.org/10.1007/s11265-019-1443-6

Download citation

Received: 30 November 2018
Accepted: 15 January 2019
Published: 21 February 2019
Issue Date: January 2020
DOI: https://doi.org/10.1007/s11265-019-1443-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

M3U: Minimum Mean Minimum Uncertainty Feature Selection for Multiclass Classification

Abstract

Access this article

Similar content being viewed by others

Feature subset selection combining maximal information entropy and maximal information coefficient

Wide-ranging approach-based feature selection for classification

Constrained class-wise feature selection (CCFS)

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

M3U: Minimum Mean Minimum Uncertainty Feature Selection for Multiclass Classification

Abstract

Access this article

Similar content being viewed by others

Feature subset selection combining maximal information entropy and maximal information coefficient

Wide-ranging approach-based feature selection for classification

Constrained class-wise feature selection (CCFS)

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation