Novel artificial bee colony based feature selection method for filtering redundant information

Wang, Youwei; Feng, Lizhou; Zhu, Jianming

doi:10.1007/s10489-017-1010-4

Novel artificial bee colony based feature selection method for filtering redundant information

Published: 04 August 2017

Volume 48, pages 868–885, (2018)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Youwei Wang¹,
Lizhou Feng² &
Jianming Zhu¹

650 Accesses
24 Citations
Explore all metrics

Abstract

Feature selection, which can reduce the dimensions of feature space without sacrificing the performance of the classifier, is an effective technique for text classification. Because many classifiers cannot deal with the features with high dimensions, filtering the redundant information from the original feature space becomes one of the core goals in feature selection field. In this paper, the concept of equivalence word set is introduced and a set of equivalence word sets (represented as EWS ₁) is constructed using the rich semantic information of the Open Directory Project (ODP). On this basis, an artificial bee colony based feature selection method is proposed for filtering the redundant information, and a feature subset FS is obtained by using an optimal feature selection (OFS) method and two predetermined thresholds. In order to obtain the best predetermined thresholds, an improved memory based artificial bee colony method (IABCM) is proposed. In the experiments, fuzzy support vector machine (FSVM) and Naïve Bayesian (NB) classifiers are used on six datasets: LingSpam, WebKB, SpamAssian, 20-Newsgroups, Reuters21578 and TREC2007. Experimental results verify that when FSVM and NB are applied, the proposed method is efficient and achieves better accuracy than several representative feature selection methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Text Feature Space Optimization Using Artificial Bee Colony

A New Hybrid Method for Text Feature Selection Through Combination of Relative Discrimination Criterion and Ant Colony Optimization

A multi-objective feature selection method based on bacterial foraging optimization

Article 27 July 2019

References

Chen J, Huang H, Tian S et al (2009) Feature selection for text classification with Naïve Bayes [J]. Expert Syst Appl 36(3):5432–5435
Article Google Scholar
Lebanon G, Mao Y, Dillon J (2007) The Locally Weighted Bag of Words Framework for Document Representation [J]. J Mach Learn Res 8(2):2405–2441
MathSciNet MATH Google Scholar
Turney PD, Pantel P (2010) From frequency to meaning: vector space models of semantics [J]. J Artif Intell Res 37(1):141–188
MathSciNet MATH Google Scholar
Gheyas IA, Smith LS (2010) Feature subset selection in large dimensionality domains. Pattern Recogn 43(1):5–13
Article MATH Google Scholar
Uğuz H (2011) A two-stage feature selection method for text classification by using information gain, principal component analysis and genetic algorithm [J]. Knowl-Based Syst 24(7):1024–1032
Article Google Scholar
Azam N, Yao J (2012) Comparison of term frequency and document frequency based feature selection metrics in text classification [J]. Expert Syst Appl 39(5):4760–4768
Article Google Scholar
Liu Y, Wang Y, Feng L et al (2014) Term frequency combined hybrid feature selection method for spam filtering [J]. Pattern Anal Applic 19(2):369–383
Article MathSciNet Google Scholar
Al-Anzi FS, Abuzeina D (2016) Toward an enhanced Arabic text classification using cosine similarity and Latent Semantic Indexing [J]. Journal of King Saud University - Computer and Information Sciences
Tenenhaus M, Vinzi VE, Chatelin YM et al (2005) PLS path modeling [J]. Comput Stat Data Anal 48:159–205
Article MathSciNet MATH Google Scholar
Kruskal JB, Wish M (1978) Multidimensional scaling [M]. Sage
Zhang W, Clark RAJ, Wang Y et al (2016) Unsupervised language identification based on Latent Dirichlet Allocation [J]. Comput Speech Lang 39:47–66
Article Google Scholar
Han M, Ren W (2015) Global mutual information-based feature selection approach using single-objective and multi-objective optimization [J]. Neurocomputing 168:47–54
Article Google Scholar
Kohavi R, John G (1997) Wrappers for feature selection [J]. Artif Intell 97(2):273–324
Article MATH Google Scholar
Quinlan JR (1986) Induction of decision trees [J]. Mach Learn 1:81–106
Google Scholar
Yang Y, Pedersen JO (1997) A comparative study on feature selection in text classification [C]. In: Proceedings of the Fourteenth International Conference on Machine Learning, pp 412–420
Google Scholar
Shang W, Huang H, Zhu H, Lin Y, Qu Y, Wang Z (2007) A novel feature selection algorithm for text classification [J]. Expert Syst Appl 33(1):1–5
Article Google Scholar
Yang HH, Moody J (1970) Feature Selection Based on Joint Mutual Information [J]
Yang J, Liu Y, Zhu X, Liu Z, Zhang X (2012) A new feature selection based on comprehensive measurement both in inter-category and intra-category for text classification [J]. Inform Process Manage 48(4):741–754
Article Google Scholar
Wang D, Zhang H, Liu R, Lv W (2012) Feature selection based on term frequency and t-test for text classification [C]. In: ACM International Conference Proceeding Series, pp 1482–1486
Google Scholar
Zhang Y, Zhang Z (2012) Feature subset selection with cumulate conditional mutual information minimization [J]. Expert Syst Appl 39(5):6078–6088
Article Google Scholar
Quinlan JR (1986) Induction of decision trees [J]. Mach Learn 1:81–106
Google Scholar
Battiti R (1994) Using mutual information for selecting features in supervised neural net learning. IEEE Trans Neural Netw 5(4):537–550
Article Google Scholar
Lin Y, Hu Q, Liu J et al (2015) Multi-label feature selection based on max-dependency and min-redundancy[J]. Neurocomputing 168:92–103
Article Google Scholar
Ševa J., Schatten M, Grd P (2015) Open directory project based universal taxonomy for personalization of online (Re)sources [J]. Expert Syst Appl 42:6306–6314
Article Google Scholar
Perugini S (2008) Symbolic links in the open directory project [J]. Inf Process Manag 44:910–930
Article Google Scholar
Foraker S, Murphy GL (2012) Polysemy in sentence comprehension: Effects of meaning dominance [J]. J Mem Lang 67(4):407–425
Article Google Scholar
Koch MR, Pavlić M, Katić MA (2015) Homonyms and Synonyms in NOK Method [J]. Procedia Eng 100:1055–1061
Article Google Scholar
WordNet 2.0. [14 August 2008]. Available from: http://wordnet.princeton.edu/oldversions
Huang KC, Geller J, Halper M et al (2009) Using WordNet synonym substitution to enhance UMLS source integration - Artificial Intelligence in Medicine [J]. Artif Intell Med 46(2):97– 109
Article Google Scholar
Kennedy J (2010) Particle swarm optimization [J]. Encyclopedia of Machine Learning, Springer US, pp 760–766
Google Scholar
Geem ZW, Kim JH, Loganathan GV (2001) A new heuristic optimization algorithm: Harmony search [J]. Simulation 76(2):60–68
Article Google Scholar
Pan WT (2012) A new fruit fly optimization algorithm: taking the financial distress model as an example [J]. Knowl-Based Syst 26:69–74
Article Google Scholar
Karaboga D, Akay B (2009) A comparative study of artificial bee colony algorithm [J]. Appl Math Comput 214(1):108–132
MathSciNet MATH Google Scholar
Karaboga D, Basturk B (2007) A powerful and efficient algorithm for numerical function optimization: artificial bee colony (ABC) algorithm [J]. J Glob Optim 39(3):459–471
Article MathSciNet MATH Google Scholar
Li XN, Yang GF (2016) Artificial bee colony algorithm with memory [J]. Appl Soft Comput 41:362–372
Article Google Scholar
Yang J, Liu Y, Liu Z et al (2011) A new feature selection algorithm based on binomial hypothesis testing for spam filtering [J]. Knowl-Based Syst 24(6):904–914
Article Google Scholar
SpamAssassin (2005) Spamassassin public corpus. http://spamassassin.apache.org/publiccorpus/. Accessed June 2008
Cormack GV TREC 2007 spam track overview [C]. In: Proceedings of TREC 2007: the 16th text retrieval conference
Porter MF (1997) An algorithm for suffix stripping [M]. Readings in information retrieval, Morgan Kaufmann Publishers Inc, Kaufmann
Lin C, Wang S (2002) Fuzzy Support Vector Machines [J]. IEEE Trans Neural Netw 13(2):464–471
Article Google Scholar
Nikhil RP, Kuhu P, James MK, James CB (2005) A possibilistic fuzzy c-means clustering algorithm [J]. IEEE Trans Fuzzy Syst 13(4):517–530
Article Google Scholar
McCallum A, Nigam K (2007) A comparison of event models for naive Bayes text classification [C]. In: EACL ’03 Proceedings of the Tenth Conference on European Chapter of the Association for Computational Linguistics, vol 1, pp 307–314
Google Scholar
Wang YW, Liu Y, Zhu X (2014) Two-step based hybrid feature selection method for spam filtering [J]. J Intell Fuzzy Syst 27(6):2785–2796
Google Scholar
Wang Y, Liu Y, Feng L et al (2015) Novel feature selection method based on harmony search for email classification [J]. Knowl-Based Syst 73:311–323
Article Google Scholar
Pan QK, Sang HY, Duan JH et al (2014) An improved fruit fly optimization algorithm for continuous function optimization problems [J]. Knowl-Based Syst 62:69–83
Article Google Scholar
Kasuya E (2010) Wilcoxon signed-ranks test: symmetry should be confirmed before the test [J]. Animal Behav 79(3):765–767
Article Google Scholar

Download references

Acknowledgments

This research was supported by the Beijing Natural Science Foundation, under grant no. 4174105, the Joint Funds of the National Natural Science Foundation of China, under grant no. U1509214, and the Discipline Construction Foundation of the Central University of Finance and Economics, under grant no. 2016XX02.

Author information

Authors and Affiliations

School of Information, Central University of Finance and Economics, Beijing, 100081, China
Youwei Wang & Jianming Zhu
School of Science and Engineering, Tianjin University of Finance and Economics, Tianjin, 300222, China
Lizhou Feng

Authors

Youwei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Lizhou Feng
View author publications
You can also search for this author in PubMed Google Scholar
Jianming Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Youwei Wang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, Y., Feng, L. & Zhu, J. Novel artificial bee colony based feature selection method for filtering redundant information. Appl Intell 48, 868–885 (2018). https://doi.org/10.1007/s10489-017-1010-4

Download citation

Published: 04 August 2017
Issue Date: April 2018
DOI: https://doi.org/10.1007/s10489-017-1010-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Novel artificial bee colony based feature selection method for filtering redundant information

Abstract

Access this article

Similar content being viewed by others

Text Feature Space Optimization Using Artificial Bee Colony

A New Hybrid Method for Text Feature Selection Through Combination of Relative Discrimination Criterion and Ant Colony Optimization

A multi-objective feature selection method based on bacterial foraging optimization

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Novel artificial bee colony based feature selection method for filtering redundant information

Abstract

Access this article

Similar content being viewed by others

Text Feature Space Optimization Using Artificial Bee Colony

A New Hybrid Method for Text Feature Selection Through Combination of Relative Discrimination Criterion and Ant Colony Optimization

A multi-objective feature selection method based on bacterial foraging optimization

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation