Abstract
Feature selection, which is a type of optimization problem, is generally achieved by combining an optimization algorithm with a classifier. Genetic algorithms and particle swarm optimization (PSO) are two commonly used optimal algorithms. Recently, cat swarm optimization (CSO) has been proposed and demonstrated to outperform PSO. However, CSO is limited by long computation times. In this paper, we modify CSO to present an improved algorithm, ICSO. We then apply the ICSO algorithm to select features in a text classification experiment for big data. Results show that the proposed ICSO outperforms traditional CSO. For big data classification, the results show that using term frequency-inverse document frequency (TF-IDF) with ICSO for feature selection is more accurate than using TF-IDF alone.
Similar content being viewed by others
References
Xu Z, Jin R, Ye J, Lyu MR, King I (2009) Non-monotonic feature selection. 26th International conference on machine learning
Zhao Z, Morstatter F, Sharma S, Alelyani S, Anand A, Liu H (2010) Advancing feature selection research-ASU feature selection repository. School of Computing, Informatics, and Decision Systems Engineering, Arizona State University, Tempe
Cha S-H, Tappert C (2009) A genetic algorithm for constructing compact binary decision trees. J Pattern Recognit Res 1:1–13
Kennedy J, Eberhart R (1995) Particle swarm optimization. In: Proceedings of IEEE international conference on neural networks IV, pp 1942–1948
Colorni A, Dorigo M, Maniezzo V (1991) Distributed optimization by ant colonies. In: Proceedings of the 1st European conference on artificial life, pp 134–142, Paris
Karaboga D (2005) An idea based on honey bee swarm for numerical optimization. Technical report-TR06, Erciyes University, Engineering Faculty, Computer Engineering Department
Chu SC, Tsai PW (2007) Computational intelligence based on the behavior of cats. Int J Innov Comput Inf Control 3(1):163–173
Deivaseelan A, Babu P (2012) Modified cat swarm optimization for Iir system identification. Adv Nat Appl Sci 6(6):731–740
Orouskhani M, Orouskhani Y, Mansouri M, Teshnehlab M (2013) A novel cat swarm optimization algorithm for unconstrained optimization problems. Inf Technol Comput Sci 5(11):32–41
Lin K-C, Zhang K-Y, Hung JC (2014) Feature selection of support vector machine based on harmonious cat swarm optimization. The 7th IEEE international conference on Ubi-Media computing (UMEDIA’14), Ulaanbaatar, Mongolia, July 12–14
Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann Publishers, San Mateo, California
Zhang GP (2000) Neural networks for classification: a survey. IEEE Trans Syst Man Cybern Part C Appl Rev 30(4):451–462
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
Lewis DD (1998) Naive (Bayes) at forty the independence assumption in information retrieval. 10th European conference on machine learning, pp 4–15
Hsu C-W, Chang C-C, Lin C-J (2003) A practical guide to support vector classiffication. From website: http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf
Chang CC, Lin CJ LIBSVM: a library for support vector machines. From website: http://www.csie.ntu.edu.tw/~cjlin/libsvm
Lin KC, Chien HY (2009) CSO-based feature selection and parameter optimization for support vector machine. In: Joint conference on pervasive computing, pp 783–788
Hettich S, Blake CL, Merz CJ (1998) UCI repository of machine learning databases. From website: http://www.ics.uci.edu/~mlearn/MLRepository.html
Salzberg SL (1997) On comparing classifiers: pitfalls to avoid and a recommended approach. Data Min Knowl Discov 1:317–328
(2014) Food Culture in Taiwan-Food Categories. From website: http://data.gov.tw/
Tsai C-H (2000) MMSEG: a word identification system for Mandarin Chinese text based on two variants of the maximum matching algorithm. From website: http://technology.chtsai.org/mmseg/
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
Lin K-C, Hsu S-H, Hung JC (2012) Adaptive SVM-based classification systems based on the improved endocrine-based PSO algorithm. Lect Notes Comput Sci 7669:543–552
Lin K-C, Huang Y-H, Hung JC, Lin Y-T (2015) Feature selection and parameter optimization of support vector machines based on modified cat swarm optimization. Int J Distrib Sens Netw 2015:9. Article ID 365869. doi:10.1155/2015/365869
Lin K-C, Hsieh Y-H (2015) Classification of medical datasets using SVMs with hybrid evolutionary algorithms based on endocrine-based particle swarm optimization and artificial bee colony algorithms. J Med Syst 39(10)
Lin K-C, Chen S-Y, Hung JC (2015) Feature selection and parameter optimization of support vector machines based on modified artificial fish swarm algorithms. Math Probl Eng 2015:9. Article ID 604108. doi:10.1155/2015/604108
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lin, KC., Zhang, KY., Huang, YH. et al. Feature selection based on an improved cat swarm optimization algorithm for big data classification. J Supercomput 72, 3210–3221 (2016). https://doi.org/10.1007/s11227-016-1631-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-016-1631-0