Constrained class-wise feature selection (CCFS)

Hussain, Syed Fawad; Shahzadi, Fatima; Munir, Badre

doi:10.1007/s13042-022-01589-5

Constrained class-wise feature selection (CCFS)

Original Article
Published: 20 June 2022

Volume 13, pages 3211–3224, (2022)
Cite this article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

241 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

Feature selection plays a vital role as a preprocessing step for high dimensional data in machine learning. The basic purpose of feature selection is to avoid “curse of dimensionality” and reduce time and space complexity of training data. Several techniques, including those that use information theory, have been proposed in the literature as a means to measure the information content of a feature. Most of them incrementally select features with max dependency with the category but minimum redundancy with already selected features. A key missing idea in these techniques is the fair representation of features with max dependency among the different categories, i.e., skewed selection of features having high mutual information (MI) with a particular class. This can result in a biased classification in favor of that particular class while other classes have low matching scores during classification. We propose a novel approach based on information theory that selects features in a class-wise fashion rather than based on their global max dependency. In addition, a constrained search is used instead of a global sequential forward search. We prove that our proposed approach enhances Maximum Relevance while keeping Minimum Redundancy under a constrained search. Results on multiple benchmark datasets show that our proposed method improves accuracy as compared to other state-of-the-art feature selection algorithms while having a lower time complexity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Feature dimensionality reduction: a review

Article Open access 21 January 2022

Feature selection techniques for machine learning: a survey of more than two decades of research

Article 01 December 2023

A review of unsupervised feature selection methods

Article 29 January 2019

Data availability

The datasets used and analyzed during the current study are publicly available in the UCI repository and other sources as given below: (1) Newsgroup: https://archive.ics.uci.edu/ml/datasets/Twenty+Newsgroups. (2) 4 Universities dataset: http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/.

Notes

Available at http://qwone.com/~jason/20Newsgroups/
Available at http://www.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/

References

Alaba PA, Popoola SI, Olatomiwa L, Akanle MB, Ohunakin OS, Adetiba E, Alex OD, Atayero AA, Daud WMAW (2019) Towards a more efficient and cost-sensitive extreme learning machine: a state-of-the-art review of recent trend. Neurocomputing 350:70–90
Article Google Scholar
Battiti R (1994) Using mutual information for selecting features in supervised neural net learning. IEEE Trans Neural Netw 5:537–550
Article Google Scholar
Bisson G, Hussain F (2008) Chi-sim: a new similarity measure for the co-clustering task. In: Proceedings of the 7th International Conference on Machine Learning and Applications. pp 211–217
Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. In: Proceedings of the 5th annual workshop on Computational learning theory, pp 144–152
Bugata P, Drotár P (2019) Weighted nearest neighbors feature selection. Knowl-Based Syst 163:749–761
Article Google Scholar
Cai J, Luo J, Wang S, Yang S (2018) Feature selection in machine learning: a new perspective. Neurocomputing 300:70–79
Article Google Scholar
Cao W, Hu L, Gao J, Wang X, Ming Z (2020) A study on the relationship between the rank of input data and the performance of random weight neural network. Neural Comput Appl 32:12685–12696
Article Google Scholar
Cao W, Wang X, Ming Z, Gao J (2018) A review on neural networks with random weights. Neurocomputing 275:278–287
Article Google Scholar
Cao W, Xie Z, Li J, Xu Z, Ming Z, Wang X (2021) Bidirectional stochastic configuration network for regression problems. Neural Netw 140:237–246
Article Google Scholar
Chen G, Chen J (2015) A novel wrapper method for feature selection and its applications. Neurocomputing 159:219–226
Article Google Scholar
Deng X, Li Y, Weng J, Zhang J (2019) Feature selection for text classification: a review. Multimed Tools Appl 78:3797–3816
Article Google Scholar
Forman G (2003) An extensive empirical study of feature selection metrics for text classification. J Mach Learn Res 3:1289–1305
MATH Google Scholar
Fragoso RC, Pinheiro RH, Cavalcanti GD (2016) Class-dependent feature selection algorithm for text categorization. In: 2016 International Joint Conference on Neural Networks (IJCNN). IEEE, pp 3508–3515
Gao W, Hu L, Zhang P (2018) Class-specific mutual information variation for feature selection. Pattern Recogn 79:328–339
Article Google Scholar
Gao W, Hu L, Zhang P, He J (2018) Feature selection considering the composition of feature relevancy. Pattern Recogn Lett 112:70–74
Article Google Scholar
Hancer E, Xue B, Zhang M, Karaboga D, Akay B (2018) Pareto front feature selection based on artificial bee colony optimization. Inf Sci 422:462–479
Article Google Scholar
Hussain S (2011) Bi-clustering gene expression data using co-similarity. Presented at the International Conferences on Advanced Data Mining and Applications, Beijing, China, pp 190–200
Hussain SF (2019) A novel robust kernel for classifying high-dimensional data using support vector machines. Expert Syst Appl 131:116–131
Article Google Scholar
Hussain SF, Iqbal S (2018) CCGA: co-similarity based co-clustering using genetic algorithm. Appl Soft Comput 72:30–42
Article Google Scholar
Labani M, Moradi P, Ahmadizar F, Jalili M (2018) A novel multivariate filter method for feature selection in text classification problems. Eng Appl Artif Intell 70:25–37
Article Google Scholar
Lewis DD (1992) Feature selection and feature extract ion for text categorization. In: Speech and Natural Language: proceedings of a Workshop Held at Harriman, New York, February, pp 23–26
Li J, Cheng K, Wang S, Morstatter F, Trevino RP, Tang J, Liu H (2017) Feature selection: a data perspective. ACM Comput Surv (CSUR) 50:1–45
Article Google Scholar
Long B, Wu X, Zhang ZM, Yu PS (2006) Unsupervised learning on k-partite graphs. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp 317–326
Long B, Zhang ZM, Yu PS (2005) Co-clustering by block value decomposition. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp 635–640
Maldonado S, Weber R (2009) A wrapper method for feature selection using support vector machines. Inf Sci 179:2208–2217
Article Google Scholar
Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27:1226–1238
Article Google Scholar
Qaisar SM, Hussain SF (2021) Effective epileptic seizure detection by using level-crossing EEG sampling sub-bands statistical features selection and machine learning for mobile healthcare. Comput Methods Programs Biomed 203:106034
Article Google Scholar
Salton G, Wong A, Yang CS (1975) A vector space model for automatic indexing. Commun ACM 18:613–620
Article Google Scholar
Shang C, Li M, Feng S, Jiang Q, Fan J (2013) Feature selection via maximizing global information gain for text classification. Knowl-Based Syst 54:298–309
Article Google Scholar
Shannon CE (1948) A mathematical theory of communication. Bell Syst Techn J 27:379–423
Article MathSciNet Google Scholar
Tibshirani R (2011) Regression shrinkage and selection via the lasso: a retrospective. J R Stat Soc Ser B (Stat Methodol) 73:273–282
Article MathSciNet Google Scholar
Vergara JR, Estévez PA (2014) A review of feature selection methods based on mutual information. Neural Comput Appl 24:175–186
Article Google Scholar
Wu G, Xu J (2015) Optimized approach of feature selection based on information gain. In: 2015 International Conference on Computer Science and Mechanical Automation (CSMA). IEEE, pp 157–161
Xu J, Jiang H (2015) An improved information gain feature selection algorithm for SVM text classifier. In: 2015 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery. IEEE, pp 273–276
Yang Y, Pedersen JO (1997) A comparative study on feature selection in text categorization. In: Proceedings of the International Conference on Machine Learning (ICML). pp 412–420
Zeng Z, Zhang H, Zhang R, Yin C (2015) A novel feature selection method considering feature interaction. Pattern Recogn 48:2656–2666
Article Google Scholar
Zhang R, Nie F, Li X, Wei X (2019) Feature selection with multi-view data: a survey. Inform Fus 50:158–167
Article Google Scholar
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B (Stat Methodol) 67:301–320
Article MathSciNet Google Scholar

Download references

Funding

This work was done as part of an MS thesis by Fatima Shahzadi supported by the Ghulam Ishaq Khan Institute of Engineering Sciences and Technology under the GA-scheme.

Author information

Authors and Affiliations

G.I.K. Institute of Engineering Sciences and Technology, Swabi, Topi, 23460, Khyber Pakhtunkhwa, Pakistan
Syed Fawad Hussain, Fatima Shahzadi & Badre Munir
Machine Learning and Data Science Lab (MDS), G.I.K. Institute, Topi, Pakistan
Syed Fawad Hussain & Fatima Shahzadi

Authors

Syed Fawad Hussain
View author publications
You can also search for this author in PubMed Google Scholar
Fatima Shahzadi
View author publications
You can also search for this author in PubMed Google Scholar
Badre Munir
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Syed Fawad Hussain.

Ethics declarations

Conflict of interest

None.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hussain, S.F., Shahzadi, F. & Munir, B. Constrained class-wise feature selection (CCFS). Int. J. Mach. Learn. & Cyber. 13, 3211–3224 (2022). https://doi.org/10.1007/s13042-022-01589-5

Download citation

Received: 04 January 2022
Accepted: 22 May 2022
Published: 20 June 2022
Issue Date: October 2022
DOI: https://doi.org/10.1007/s13042-022-01589-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Constrained class-wise feature selection (CCFS)

Abstract

Access this article

Similar content being viewed by others

Feature dimensionality reduction: a review

Feature selection techniques for machine learning: a survey of more than two decades of research

A review of unsupervised feature selection methods

Data availability

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Constrained class-wise feature selection (CCFS)

Abstract

Access this article

Similar content being viewed by others

Feature dimensionality reduction: a review

Feature selection techniques for machine learning: a survey of more than two decades of research

A review of unsupervised feature selection methods

Data availability

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation