A sub-concept-based feature selection method for one-class classification

Liu, Zhen; Japkowicz, Nathalie; Wang, Ruoyu; Liu, Li

doi:10.1007/s00500-020-04828-5

A sub-concept-based feature selection method for one-class classification

Foundations
Published: 10 March 2020

Volume 24, pages 7047–7062, (2020)
Cite this article

Soft Computing Aims and scope Submit manuscript

Zhen Liu^1,2,5,
Nathalie Japkowicz²,
Ruoyu Wang^3,4 &
…
Li Liu^2,6

301 Accesses
2 Citations
Explore all metrics

Abstract

Similarly to binary classification methods, one-class classification methods could benefit from feature selection. However, the feature selection algorithms for the binary or multi-class are not applicable to one-class classification situations since only one class of instances is provided. Few techniques have been proposed so far for feature selection in one-class classification. This paper focuses on designing a filter-based feature selection method for one-class classification. Our approach is based on the observation that for some tasks such as outlier detection, anomaly detection, the training data (normal data) may contain multiple sub-concepts. The sub-concept is a source of data complexity. Our approach aims at searching the features that characterize the instances of the sub-concepts more compact, so as to reduce the data complexity. It firstly finds the sub-concepts using a clustering algorithm with a fixed cluster number and then applies combined feature measures to evaluate the relevance between each feature and the sub-concepts. A fixed number of features—those with the highest relevance scores—are selected as a feature subset. In the searching process, the Davies–Bouldin Index is used to assess the data complexity on the sub-concepts obtained with different number of clusters. The feature subset with the lowest DBI is selected as the final feature subset. Experiments on UCI benchmark and cyber security datasets demonstrate that our feature selection algorithm can select relevant features and improve the performance of one-class classification on multimodal data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Information Theory-Based Feature Selection: Minimum Distribution Similarity with Removed Redundancy

Constrained class-wise feature selection (CCFS)

Article 20 June 2022

M3U: Minimum Mean Minimum Uncertainty Feature Selection for Multiclass Classification

Article 21 February 2019

Notes

ODDS library, https://odds.cs.stonybrook.edu.
It is the results of the only filter-based feature selection approach for one-class classification that we have found.
Mnist dataset is converted for outlier detection as digit-zero class is considered as inliers, while 700 images are sampled from digit-six class as the outliers.

References

Bellinger C, Sharma S, Japkowicz N (2018) One-class classification—from theory to practice: a case-study in radioactive threat detection. Expert Syst Appl 108:223–232
Article Google Scholar
Cano J (2013) Analysis of data complexity measures for classification. Expert Syst Appl 40(2013):4820–4831
Article Google Scholar
Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Electr Eng 40(1):16–28
Article Google Scholar
Creech G, Hu J (2013) Generation of a new IDS test dataset: time to retire the KDD collection. In: IEEE wireless communications and networking conference (WCNC). pp 4487–4492
Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell 2:224–227
Article Google Scholar
De Faria ER, De Leon Ferreira ACP, Gama J (2016) MINAS: multiclass learning algorithm for novelty detection in data streams. Data Min Knowl Disc 30(3):640–680
Article MathSciNet Google Scholar
Dong Y, Japkowicz N (2018) Threaded ensembles of autoencoders for stream learning. Comput Intell 34(1):261–281
Article MathSciNet Google Scholar
Erfani SM, Rajasegarar S, Karunasekera S et al (2016) High-dimensional and large-scale anomaly detection using a linear one-class svm with deep learning. Pattern Recogn 58:121–134
Article Google Scholar
Hancer E (2019) Differential evolution for feature selection: a fuzzy wrapper–filter approach. Soft Comput Fus Found Methodol Appl 23(13):5233–5248
Google Scholar
Haider W, Hu J, Slay J et al (2017) Generating realistic intrusion detection system dataset based on fuzzy qualitative modeling. J Netw Comput Appl 87:185–192
Article Google Scholar
Hempstalk K, Frank E, Witten IH (2008) One-class classification by combining density and class probability estimation. In Joint European conference on machine learning and knowledge discovery in database. pp 505–519
Inbarani HH, Bagyamathi M, Azar AT (2015) A novel hybrid feature selection method based on rough set and improved harmony search. Neural Comput Appl 26(8):1859–1880
Article Google Scholar
Japkowicz N, Myers C, Gluck M, et al (1995) A novelty detection approach to classification. In: IJCAI. pp 518–523.
Japkowicz N (2001) Supervised versus unsupervised binary-learning by feedforward neural networks. Mach Learn 42(1–2):97–122
Article MATH Google Scholar
Jeong YS, Kang IH, Jeong MK, Kong D (2012) A new feature selection method for one-class classification problems. IEEE Trans Syst Man Cybern Part C (Appl Rev) 42(6):1500–1509
Article Google Scholar
Kajó M, Nováczki S (2016) A genetic feature selection algorithm for anomaly classification in mobile networks. In: 19th international ICIN conference-innovations in clouds, internet and networks. pp 204–211
Kang I, Jeong MK, Kong D (2012) A differentiated one-class classification method with applications to intrusion detection. Expert Syst Appl 39(4):3899–3905
Article Google Scholar
Khan SS, Madden MG (2009) A survey of recent trends in one class classification. In: Irish conference on artificial intelligence and cognitive science. pp 188–197
Khan SS, Madden MG (2014) One-class classification: taxonomy of study and review of techniques. Knowl Eng Rev 29(3):345–374
Article Google Scholar
Krawczyk B, Woźniak M (2016) Dynamic classifier selection for one-class classification. Knowl Based Syst 107:43–53
Article Google Scholar
Leng Q, Qi H, Miao J et al (2015) One-class classification with extreme learning machine. Math Probl Eng 2015:1–11
Article MathSciNet MATH Google Scholar
Liu FT, Ting KM, Zhou ZH (2008) Isolation forest. In: Eighth IEEE international conference on data mining. Pp. 413–422.
Liu Z, Tang D, Cai Y et al (2017) A hybrid method based on ensemble WELM for handling multi class imbalance in cancer microarray data. Neurocomputing 266:641–650
Article Google Scholar
Liu Z, Wang R, Tao M et al (2015) A class-oriented feature selection approach for multi-class imbalanced network traffic datasets based on local and global metrics fusion. Neurocomputing 168:365–381
Article Google Scholar
Lorena LH, Carvalho AC, Lorena AC (2015) Filter feature selection for one-class classification. J Intell Rob Syst 80(1):227–243
Article Google Scholar
Macià N, Mansilla EB, Orriolspuig A (2008) On the dimensions of data complexity through synthetic data sets. In: Conference on artificial intelligence research & development: international conference of the Catalan Association for Artificial Intelligence, 2008, pp 244–252
Swersky L, Marques HO, Sander J, et al (2016) On the evaluation of outlier detection and one-class classification methods. In: IEEE international conference on data science and advanced analytics (DSAA). pp 1–10
Sarhrouni E, Hammouch A, Aboutajdine D (2012) Application of symmetric uncertainty and mutual information to dimensionality reduction and classification of hyperspectral images. Int J Eng Technol 4(5):268–276
Google Scholar
Scholkopf B, Platt JC, Shawe-Taylor J, A. et al (2001) Williamson Estimating the support of a high dimensional distribution. Neural Comput 13(7):1443–1471
Article MATH Google Scholar
Sharma S, Somayaji A, Japkowicz N (2018) Learning over subconcepts: strategies for 1-class classification. Comput Intell 34(2):440–467
Article MathSciNet Google Scholar
Varma PRK, Kumari VV, Kumar SS (2018) A survey of feature selection techniques in intrusion detection system: a soft computing perspective. In: Progress in computing, analytics and networking. pp 785–793
Xie M, Hu J (2013) Evaluating host-based anomaly detection systems: A preliminary analysis of adfa-ld. In: 6th international congress on image and signal processing (CISP). pp 1711–1716.
Xie M, Hu J, Slay J (2014) Evaluating host-based anomaly detection systems: Application of the one-class svm algorithm to adfa-ld. In: 11th international conference on fuzzy systems and knowledge discovery (FSKD). pp 978–982

Download references

Acknowledgements

We thank the anonymous reviewers for their constructive comments. This work is supported by the National Natural Science Foundation of China under Grant No. 61501128, financial support from China Scholarship Council, Natural Science Foundation of Guangdong Province (Nos. 2017A030313345, 2016A030310300), the Specialized Fund for the Basic Research Operating expenses Program of Central College (No. x2rj/D2174870), the Young Innovative Talents Project of Guangdong Universities, grant number 2017KQNCX107.

Author information

Authors and Affiliations

School of Medical Information Engineering, Guangdong Pharmaceutical University, Guangzhou, 510006, China
Zhen Liu
Department of Computer Science, American University, Washington, DC, 20016, USA
Zhen Liu, Nathalie Japkowicz & Li Liu
Information and Network Engineering and Research Center, South China University of Technology, Guangzhou, 510041, China
Ruoyu Wang
Communication and Computer Network Lab of Guangdong, Guangzhou, 510041, China
Ruoyu Wang
Guangdong Province Precise Medicine and Big Data Engineering Technology Research Center for Traditional Chinese Medicine, Guangzhou, 510006, China
Zhen Liu
Department of Computer Science, Huizhou University, Huizhou, 516007, China
Li Liu

Authors

Zhen Liu
View author publications
You can also search for this author in PubMed Google Scholar
Nathalie Japkowicz
View author publications
You can also search for this author in PubMed Google Scholar
Ruoyu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Li Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ruoyu Wang.

Ethics declarations

Conflict of interest

We declare that we have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants performed by any of the authors.

Additional information

Communicated by A. Di Nola.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, Z., Japkowicz, N., Wang, R. et al. A sub-concept-based feature selection method for one-class classification. Soft Comput 24, 7047–7062 (2020). https://doi.org/10.1007/s00500-020-04828-5

Download citation

Published: 10 March 2020
Issue Date: May 2020
DOI: https://doi.org/10.1007/s00500-020-04828-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A sub-concept-based feature selection method for one-class classification

Abstract

Access this article

Similar content being viewed by others

Information Theory-Based Feature Selection: Minimum Distribution Similarity with Removed Redundancy

Constrained class-wise feature selection (CCFS)

M3U: Minimum Mean Minimum Uncertainty Feature Selection for Multiclass Classification

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A sub-concept-based feature selection method for one-class classification

Abstract

Access this article

Similar content being viewed by others

Information Theory-Based Feature Selection: Minimum Distribution Similarity with Removed Redundancy

Constrained class-wise feature selection (CCFS)

M3U: Minimum Mean Minimum Uncertainty Feature Selection for Multiclass Classification

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation