Abstract
Feature selection plays a vital role as a preprocessing step for high dimensional data in machine learning. The basic purpose of feature selection is to avoid “curse of dimensionality” and reduce time and space complexity of training data. Several techniques, including those that use information theory, have been proposed in the literature as a means to measure the information content of a feature. Most of them incrementally select features with max dependency with the category but minimum redundancy with already selected features. A key missing idea in these techniques is the fair representation of features with max dependency among the different categories, i.e., skewed selection of features having high mutual information (MI) with a particular class. This can result in a biased classification in favor of that particular class while other classes have low matching scores during classification. We propose a novel approach based on information theory that selects features in a class-wise fashion rather than based on their global max dependency. In addition, a constrained search is used instead of a global sequential forward search. We prove that our proposed approach enhances Maximum Relevance while keeping Minimum Redundancy under a constrained search. Results on multiple benchmark datasets show that our proposed method improves accuracy as compared to other state-of-the-art feature selection algorithms while having a lower time complexity.
Similar content being viewed by others
Data availability
The datasets used and analyzed during the current study are publicly available in the UCI repository and other sources as given below: (1) Newsgroup: https://archive.ics.uci.edu/ml/datasets/Twenty+Newsgroups. (2) 4 Universities dataset: http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/.
Notes
Available at http://qwone.com/~jason/20Newsgroups/
References
Alaba PA, Popoola SI, Olatomiwa L, Akanle MB, Ohunakin OS, Adetiba E, Alex OD, Atayero AA, Daud WMAW (2019) Towards a more efficient and cost-sensitive extreme learning machine: a state-of-the-art review of recent trend. Neurocomputing 350:70–90
Battiti R (1994) Using mutual information for selecting features in supervised neural net learning. IEEE Trans Neural Netw 5:537–550
Bisson G, Hussain F (2008) Chi-sim: a new similarity measure for the co-clustering task. In: Proceedings of the 7th International Conference on Machine Learning and Applications. pp 211–217
Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. In: Proceedings of the 5th annual workshop on Computational learning theory, pp 144–152
Bugata P, Drotár P (2019) Weighted nearest neighbors feature selection. Knowl-Based Syst 163:749–761
Cai J, Luo J, Wang S, Yang S (2018) Feature selection in machine learning: a new perspective. Neurocomputing 300:70–79
Cao W, Hu L, Gao J, Wang X, Ming Z (2020) A study on the relationship between the rank of input data and the performance of random weight neural network. Neural Comput Appl 32:12685–12696
Cao W, Wang X, Ming Z, Gao J (2018) A review on neural networks with random weights. Neurocomputing 275:278–287
Cao W, Xie Z, Li J, Xu Z, Ming Z, Wang X (2021) Bidirectional stochastic configuration network for regression problems. Neural Netw 140:237–246
Chen G, Chen J (2015) A novel wrapper method for feature selection and its applications. Neurocomputing 159:219–226
Deng X, Li Y, Weng J, Zhang J (2019) Feature selection for text classification: a review. Multimed Tools Appl 78:3797–3816
Forman G (2003) An extensive empirical study of feature selection metrics for text classification. J Mach Learn Res 3:1289–1305
Fragoso RC, Pinheiro RH, Cavalcanti GD (2016) Class-dependent feature selection algorithm for text categorization. In: 2016 International Joint Conference on Neural Networks (IJCNN). IEEE, pp 3508–3515
Gao W, Hu L, Zhang P (2018) Class-specific mutual information variation for feature selection. Pattern Recogn 79:328–339
Gao W, Hu L, Zhang P, He J (2018) Feature selection considering the composition of feature relevancy. Pattern Recogn Lett 112:70–74
Hancer E, Xue B, Zhang M, Karaboga D, Akay B (2018) Pareto front feature selection based on artificial bee colony optimization. Inf Sci 422:462–479
Hussain S (2011) Bi-clustering gene expression data using co-similarity. Presented at the International Conferences on Advanced Data Mining and Applications, Beijing, China, pp 190–200
Hussain SF (2019) A novel robust kernel for classifying high-dimensional data using support vector machines. Expert Syst Appl 131:116–131
Hussain SF, Iqbal S (2018) CCGA: co-similarity based co-clustering using genetic algorithm. Appl Soft Comput 72:30–42
Labani M, Moradi P, Ahmadizar F, Jalili M (2018) A novel multivariate filter method for feature selection in text classification problems. Eng Appl Artif Intell 70:25–37
Lewis DD (1992) Feature selection and feature extract ion for text categorization. In: Speech and Natural Language: proceedings of a Workshop Held at Harriman, New York, February, pp 23–26
Li J, Cheng K, Wang S, Morstatter F, Trevino RP, Tang J, Liu H (2017) Feature selection: a data perspective. ACM Comput Surv (CSUR) 50:1–45
Long B, Wu X, Zhang ZM, Yu PS (2006) Unsupervised learning on k-partite graphs. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp 317–326
Long B, Zhang ZM, Yu PS (2005) Co-clustering by block value decomposition. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp 635–640
Maldonado S, Weber R (2009) A wrapper method for feature selection using support vector machines. Inf Sci 179:2208–2217
Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27:1226–1238
Qaisar SM, Hussain SF (2021) Effective epileptic seizure detection by using level-crossing EEG sampling sub-bands statistical features selection and machine learning for mobile healthcare. Comput Methods Programs Biomed 203:106034
Salton G, Wong A, Yang CS (1975) A vector space model for automatic indexing. Commun ACM 18:613–620
Shang C, Li M, Feng S, Jiang Q, Fan J (2013) Feature selection via maximizing global information gain for text classification. Knowl-Based Syst 54:298–309
Shannon CE (1948) A mathematical theory of communication. Bell Syst Techn J 27:379–423
Tibshirani R (2011) Regression shrinkage and selection via the lasso: a retrospective. J R Stat Soc Ser B (Stat Methodol) 73:273–282
Vergara JR, Estévez PA (2014) A review of feature selection methods based on mutual information. Neural Comput Appl 24:175–186
Wu G, Xu J (2015) Optimized approach of feature selection based on information gain. In: 2015 International Conference on Computer Science and Mechanical Automation (CSMA). IEEE, pp 157–161
Xu J, Jiang H (2015) An improved information gain feature selection algorithm for SVM text classifier. In: 2015 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery. IEEE, pp 273–276
Yang Y, Pedersen JO (1997) A comparative study on feature selection in text categorization. In: Proceedings of the International Conference on Machine Learning (ICML). pp 412–420
Zeng Z, Zhang H, Zhang R, Yin C (2015) A novel feature selection method considering feature interaction. Pattern Recogn 48:2656–2666
Zhang R, Nie F, Li X, Wei X (2019) Feature selection with multi-view data: a survey. Inform Fus 50:158–167
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B (Stat Methodol) 67:301–320
Funding
This work was done as part of an MS thesis by Fatima Shahzadi supported by the Ghulam Ishaq Khan Institute of Engineering Sciences and Technology under the GA-scheme.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
None.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Hussain, S.F., Shahzadi, F. & Munir, B. Constrained class-wise feature selection (CCFS). Int. J. Mach. Learn. & Cyber. 13, 3211–3224 (2022). https://doi.org/10.1007/s13042-022-01589-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-022-01589-5