Abstract
In this paper, a document clustering method with a hybrid feature selection method is proposed. The proposed hybrid feature selection method integrates a Genetic-based wrapper method with ranking filter. The method is named Memetic Algorithm-Feature Selection (MA-FS). In this paper, MA-FS is combined with K-means and Spherical K-means (SK-means) clustering methods to perform document clustering. For the purpose of comparison, another unsupervised feature selection method, Feature Selection Genetic Text Clustering (FSGATC), is used. Two real-world criminal report document sets were used along with two popular benchmark datasets which are Reuters and 20newsgroup, were used in the comparisons. F-Micro, F-Macro and Average Distance of Document to Cluster (ADDC) measures were used for evaluation. The test results showed that the MA-FS method has outperformed the FSGATC method. It has also outperformed the results after using the entire feature space (ALL).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Luo, M., Nie, F., Chang, X., Yang, Y., Hauptmann, A., Zheng, Q.: Adaptive unsupervised feature selection with structure regularization. IEEE Trans. Neural Netw. Learn. Syst. PP(99), 1–13 (2017)
Nie, P.: A filter method for solving nonlinear complementarity problems. Appl. Math. Comput. 167(1), 677–694 (2005)
Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97(1), 273–324 (1997)
Maldonado, S., Weber, R.: A wrapper method for feature selection using support vector machines. Inf. Sci. 179(13), 2208–2217 (2009)
Souza, J., Japkowicz, N., Matwin, S.: Feature selection with a general hybrid algorithm. In: SIAM International Conference on Data Mining 2005, Newport Beach, CA, p. 45 (2005)
Saeys, Y., Inza, I., Larrañaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)
Al-Jadir, I., Wong, K.W., Fung, C.C., Xie, H.: Text document clustering using memetic feature selection. In: Proceedings of the 9th International Conference on Machine Learning and Computing, pp. 415–420. ACM: Singapore (2017)
Vergara, J.R., Estévez, P.: A review of feature selection methods based on mutual information. Neural Comput. Appl. 24(1), 175–186 (2014)
Zorarpacı, E., Özel, S.A.: A hybrid approach of differential evolution and artificial bee colony for feature selection. Expert Syst. Appl. 62, 91–103 (2016)
Abualigah, L.M., Khader, A.T., Al-Betar, M.A.: Unsupervised feature selection technique based on genetic algorithm for improving the Text Clustering. In: 2016 7th International Conference on Computer Science and Information Technology (CSIT). IEEE (2016)
Ong, Y., Lim, M., Zhu, N., Wong, K.: Classification of adaptive memetic algorithms: a comparative study. IEEE Trans. Syst. Man Cybern. Part B: Cybern. 36(1), 141–152 (2006)
Aarts, E., Laarhoven, P.V.: Simulated annealing: an introduction. Stat. Neerl. 43(1), 31–52 (1989)
Mirjalili, S., Lewis, A.: The whale optimization algorithm. Adv. Eng. Softw. 95, 51–67 (2016)
Mafarja, M.M., Mirjalili, S.: Hybrid whale optimization algorithm with simulated annealing for feature selection. Neurocomputing 260, 302–312 (2017)
Lee, J., Kim, D.-W.: Memetic feature selection algorithm for multi-label classification. Inf. Sci. 293, 80–96 (2015)
Uysal, A.K., Gunal, S.: The impact of preprocessing on text classification. Inf. Process. Manag. 50(1), 104–112 (2014)
Hartigan, J.A., Wong, M.A.: Algorithm AS 136 a k-means clustering algorithm. Appl. Stat. 28, 100–108 (1979)
Duwairi, R., Abu-Rahmeh, M.: A novel approach for initializing the spherical k-means clustering algorithm. Simul. Model. Pract. Theory 54, 49–63 (2015)
Acknowledgment
Ibraheem wants to thank the Higher Committee for Education Development in Iraq (HCED) for the funning of his scholarship.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Al-Jadir, I., Wong, K.W., Fung, C.C., Xie, H. (2017). Text Dimensionality Reduction for Document Clustering Using Hybrid Memetic Feature Selection. In: Phon-Amnuaisuk, S., Ang, SP., Lee, SY. (eds) Multi-disciplinary Trends in Artificial Intelligence. MIWAI 2017. Lecture Notes in Computer Science(), vol 10607. Springer, Cham. https://doi.org/10.1007/978-3-319-69456-6_23
Download citation
DOI: https://doi.org/10.1007/978-3-319-69456-6_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-69455-9
Online ISBN: 978-3-319-69456-6
eBook Packages: Computer ScienceComputer Science (R0)