Skip to main content
Log in

M3U: Minimum Mean Minimum Uncertainty Feature Selection for Multiclass Classification

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

This paper presents a novel multiclass feature selection algorithm based on weighted conditional entropy, also referred to as uncertainty. The goal of the proposed algorithm is to select a feature subset such that, for each feature sample, there exists a feature that has a low uncertainty score in the selected feature subset. Features are first quantized into different bins. The proposed feature selection method first computes an uncertainty vector from weighted conditional entropy. Lower the uncertainty score for a class, better is the separability of the samples in that class. Next, an iterative feature selection method selects a feature in each iteration by (1) computing the minimum uncertainty score for each feature sample for all possible feature subset candidates, (2) computing the average minimum uncertainty score across all feature samples, and (3) selecting the feature that achieves the minimum of the mean of the minimum uncertainty score. The experimental results show that the proposed algorithm outperforms mRMR and achieves lower misclassification rates using various types of publicly available datasets. In most cases, the number of features necessary for a specified misclassification error is less than that required by traditional methods. For all datasets, the misclassification error is reduced by 5∼25% on average, compared to a traditional method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11

Similar content being viewed by others

References

  1. Allwein, E.L., Schapire, R.E., Singer, Y. (2000). Reducing multiclass to binary: a unifying approach for margin classifiers. Journal of Machine Learning Research, 1(Dec), 113–141.

    MathSciNet  MATH  Google Scholar 

  2. Anguita, D., Ghio, A., Oneto, L., Parra, X., Reyes-Ortiz, J.L. (2013). A public domain dataset for human activity recognition using smartphones. In ESANN.

  3. Battiti, R. (1994). Using mutual information for selecting features in supervised neural net learning. IEEE Transactions on Neural Networks, 5(4), 537–550.

    Article  Google Scholar 

  4. Bermingham, M.L., Pong-Wong, R., Spiliopoulou, A., Hayward, C., Rudan, I., Campbell, H., Wright, A.F., Wilson, J.F., Agakov, F., Navarro, P., Haley, C.S. (2015). Application of high-dimensional feature selection: evaluation for genomic prediction in man. Scientific Reports, 5, 10312.

    Article  Google Scholar 

  5. Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A. (1984). Classification and regression trees. Boca Raton: CRC Press.

    MATH  Google Scholar 

  6. Brown, G. (2009). A new perspective for information theoretic feature selection. In AISTATS, pp. 49–56.

  7. Chen, Y.W., & Lin, C.J. (2006). Combining SVMs with various feature selection strategies. Feature Extraction, 207, 315–324.

    Article  Google Scholar 

  8. Cover, T.M., & Thomas, J.A. (2012). Elements of information theory. New York: Wiley.

    MATH  Google Scholar 

  9. Dash, M., & Liu, H. (2003). Consistency-based search in feature selection. Artificial Intelligence, 151(1), 155–176.

    Article  MathSciNet  Google Scholar 

  10. Devijver, P.A., & Kittler, J. (1982). Pattern recognition: a statistical approach. New Jersey: Prentice hall.

    MATH  Google Scholar 

  11. Dietterich, T.G., & Bakiri, G. (1995). Solving multiclass learning problems via error-correcting output codes. Journal of Artificial Intelligence Research, 2, 263–286.

    Article  Google Scholar 

  12. Fayyad, U., Piatetsky-Shapiro, G., Smyth, P. (1996). From data mining to knowledge discovery in databases. AI Magazine, 17(3), 37.

    Google Scholar 

  13. Fleuret, F. (2004). Fast binary feature selection with conditional mutual information. Journal of Machine Learning Research, 5(Nov), 1531–1555.

    MathSciNet  MATH  Google Scholar 

  14. Forman, G. (2003). An extensive empirical study of feature selection metrics for text classification. Journal of Machine Learning Research, 3(Mar), 1289–1305.

    MATH  Google Scholar 

  15. Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 3(Mar), 1157–1182.

    MATH  Google Scholar 

  16. Guyon, I., Weston, J., Barnhill, S., Vapnik, V. (2002). Gene selection for cancer classification using support vector machines. Machine Learning, 46(1-3), 389–422.

    Article  Google Scholar 

  17. Hall, M.A. (2000). Correlation-based feature selection of discrete and numeric class machine learning. In Proceedings of the 17th international conference on machine learning, pp. 359–366.

  18. Henze, N., & Penrose, M.D. (1999). On the multivariate runs test. Annals of Statistics, pp. 290–298.

  19. Hou, Y., Zhang, P., Yan, T., Li, W., Song, D. (2010). Beyond redundancies: a metric-invariant method for unsupervised feature selection. IEEE Transactions on Knowledge and Data Engineering, 22(3), 348–364.

    Article  Google Scholar 

  20. Huang, N.E., Shen, Z., Long, S.R., Wu, M.C., Shih, H.H., Zheng, Q., Yen, N.C., Tung, C.C., Liu, H.H. (1998). The empirical mode decomposition and the hilbert spectrum for nonlinear and non-stationary time series analysis. In Proceedings of the Royal Society of London a: mathematical, physical and engineering sciences, vol. 454, pp. 903–995. The Royal Society.

    Article  MathSciNet  Google Scholar 

  21. James, G., Witten, D., Hastie, T., Tibshirani, R. (2013). An introduction to statistical learning, vol. 6. Berlin: Springer.

    Book  Google Scholar 

  22. Jolliffe, I. (2002). Principal component analysis. Wiley Online Library.

  23. Kariwala, V., Ye, L., Cao, Y. (2013). Branch and bound method for regression-based controlled variable selection. Computers and Chemical Engineering, 54, 1–7.

    Article  Google Scholar 

  24. Kohavi, R., & John, G.H. (1997). Wrappers for feature subset selection. Artificial Intelligence, 97(1), 273–324.

    Article  Google Scholar 

  25. Kwak, N., & Choi, C.H. (2002). Input feature selection for classification problems. IEEE Transactions on Neural Networks, 13(1), 143–159.

    Article  Google Scholar 

  26. Langley, P. (1994). Selection of relevant features in machine learning. In Proceedings of the AAAI fall symposium on relevance, vol. 184, pp. 245–271.

  27. Lichman, M. (2013). UCI machine learning repository. http://archive.ics.uci.edu/ml.

  28. Lilliefors, H.W. (1967). On the Kolmogorov-Smirnov test for normality with mean and variance unknown. Journal of the American Statistical Association, 62(318), 399–402.

    Article  Google Scholar 

  29. Lin, D., & Tang, X. (2006). Conditional infomax learning: an integrated framework for feature extraction and fusion. In European conference on computer vision, pp. 68–82. Springer.

  30. Liu, H., & Motoda, H. (2012). Feature selection for knowledge discovery and data mining, vol. 454. Berlin: Springer.

    Google Scholar 

  31. Liu, H., & Yu, L. (2005). Toward integrating feature selection algorithms for classification and clustering. IEEE Transactions on Knowledge and Data Engineering, 17(4), 491–502.

    Article  Google Scholar 

  32. Maji, P., & Pal, S.K. (2010). Feature selection using f-information measures in fuzzy approximation spaces. IEEE Transactions on Knowledge and Data Engineering, 22(6), 854–867.

    Article  Google Scholar 

  33. Otto. (2014). Otto group product classification challenge. https://www.kaggle.com/.

  34. Paschke, F., Bayer, C., Bator, M., Mönks, U., Dicks, A., Enge-Rosenblatt, O., Lohweg, V. (2013). Sensorlose zustandsüberwachung an synchronmotoren. In Proceedings. 23. Workshop computational intelligence, dortmund, 5.-6. December 2013, p. 211. KIT Scientific Publishing.

  35. Peng, H., Long, F., Ding, C. (2005). Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(8), 1226–1238.

    Article  Google Scholar 

  36. Pudil, P., Novovičová, J., Kittler, J. (1994). Floating search methods in feature selection. Pattern Recognition Letters, 15(11), 1119–1125.

    Article  Google Scholar 

  37. Qu, G., Hariri, S., Yousif, M. (2005). A new dependency and correlation analysis for features. IEEE Transactions on Knowledge and Data Engineering, 17(9), 1199–1207.

    Article  Google Scholar 

  38. Reyes-Ortiz, J.L., Oneto, L., Ghio, A., Samá, A., Anguita, D., Parra, X. (2014). Human activity recognition on smartphones with awareness of basic activities and postural transitions. In International conference on artificial neural networks, pp. 177–184. Springer.

  39. Reyes-Ortiz, J.L., Oneto, L., Samà, A., Parra, X., Anguita, D. (2016). Transition-aware human activity recognition using smartphones. Neurocomputing, 171, 754–767.

    Article  Google Scholar 

  40. Sayood, K. (2012). Introduction to data compression. Burlington: Morgan Kaufmann.

    MATH  Google Scholar 

  41. Siedlecki, W., & Sklansky, J. (1988). On automatic feature selection. International Journal of Pattern Recognition and Artificial Intelligence, 2(02), 197–220.

    Article  Google Scholar 

  42. Somol, P., Pudil, P., Kittler, J. (2004). Fast branch & bound algorithms for optimal feature selection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(7), 900–912.

    Article  Google Scholar 

  43. Theodoridis, S., & Koutroumbas, K. (2008). Pattern recognition. Cambridge: Academic Press.

    MATH  Google Scholar 

  44. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 58(1), 267–288.

    Article  MathSciNet  Google Scholar 

  45. UCI. (2014). Forest cover type prediction. https://www.kaggle.com/.

  46. Vergara, J.R., & Estévez, P.A. (2014). A review of feature selection methods based on mutual information. Neural Computing and Applications, 24(1), 175–186.

    Article  Google Scholar 

  47. Vidal-Naquet, M., & Ullman, S. (2003). Object recognition with informative features and linear classification. In ICCV, vol. 3, p. 281.

  48. Wang, D., Nie, F., Huang, H. (2015). Feature selection via global redundancy minimization. IEEE Transactions on Knowledge and Data Engineering, 27(10), 2743–2755.

    Article  Google Scholar 

  49. Weston, J., Elisseeff, A., Schölkopf, B., Tipping, M. (2003). Use of the zero-norm with linear models and kernel methods. Journal of Machine Learning Research, 3(Mar), 1439–1461.

    MathSciNet  MATH  Google Scholar 

  50. Weston, J., Mukherjee, S., Chapelle, O., Pontil, M., Poggio, T., Vapnik, V. (2000). Feature selection for SVMs advances in neural information processing systems.

  51. Yang, H.H., & Moody, J.E. (1999). Data visualization and feature selection: new algorithms for nongaussian data. In NIPS, vol. 99, pp. 687–693. Citeseer.

  52. Yang, S.H., & Hu, B.G. (2012). Discriminative feature selection by nonparametric bayes error minimization. IEEE Transactions on Knowledge and Data Engineering, 24(8), 1422–1434.

    Article  Google Scholar 

  53. Yu, L., & Liu, H. (2004). Efficient feature selection via analysis of relevance and redundancy. Journal of Machine Learning Research, 5(Oct), 1205–1224.

    MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Keshab K. Parhi.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, Z., Parhi, K.K. M3U: Minimum Mean Minimum Uncertainty Feature Selection for Multiclass Classification. J Sign Process Syst 92, 9–22 (2020). https://doi.org/10.1007/s11265-019-1443-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-019-1443-6

Keywords

Navigation