Skip to main content

Text Dimensionality Reduction for Document Clustering Using Hybrid Memetic Feature Selection

  • Conference paper
  • First Online:
Book cover Multi-disciplinary Trends in Artificial Intelligence (MIWAI 2017)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10607))

Abstract

In this paper, a document clustering method with a hybrid feature selection method is proposed. The proposed hybrid feature selection method integrates a Genetic-based wrapper method with ranking filter. The method is named Memetic Algorithm-Feature Selection (MA-FS). In this paper, MA-FS is combined with K-means and Spherical K-means (SK-means) clustering methods to perform document clustering. For the purpose of comparison, another unsupervised feature selection method, Feature Selection Genetic Text Clustering (FSGATC), is used. Two real-world criminal report document sets were used along with two popular benchmark datasets which are Reuters and 20newsgroup, were used in the comparisons. F-Micro, F-Macro and Average Distance of Document to Cluster (ADDC) measures were used for evaluation. The test results showed that the MA-FS method has outperformed the FSGATC method. It has also outperformed the results after using the entire feature space (ALL).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Luo, M., Nie, F., Chang, X., Yang, Y., Hauptmann, A., Zheng, Q.: Adaptive unsupervised feature selection with structure regularization. IEEE Trans. Neural Netw. Learn. Syst. PP(99), 1–13 (2017)

    Google Scholar 

  2. Nie, P.: A filter method for solving nonlinear complementarity problems. Appl. Math. Comput. 167(1), 677–694 (2005)

    MATH  MathSciNet  Google Scholar 

  3. Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97(1), 273–324 (1997)

    Article  MATH  Google Scholar 

  4. Maldonado, S., Weber, R.: A wrapper method for feature selection using support vector machines. Inf. Sci. 179(13), 2208–2217 (2009)

    Article  Google Scholar 

  5. Souza, J., Japkowicz, N., Matwin, S.: Feature selection with a general hybrid algorithm. In: SIAM International Conference on Data Mining 2005, Newport Beach, CA, p. 45 (2005)

    Google Scholar 

  6. Saeys, Y., Inza, I., Larrañaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)

    Article  Google Scholar 

  7. Al-Jadir, I., Wong, K.W., Fung, C.C., Xie, H.: Text document clustering using memetic feature selection. In: Proceedings of the 9th International Conference on Machine Learning and Computing, pp. 415–420. ACM: Singapore (2017)

    Google Scholar 

  8. Vergara, J.R., Estévez, P.: A review of feature selection methods based on mutual information. Neural Comput. Appl. 24(1), 175–186 (2014)

    Article  Google Scholar 

  9. Zorarpacı, E., Özel, S.A.: A hybrid approach of differential evolution and artificial bee colony for feature selection. Expert Syst. Appl. 62, 91–103 (2016)

    Article  Google Scholar 

  10. Abualigah, L.M., Khader, A.T., Al-Betar, M.A.: Unsupervised feature selection technique based on genetic algorithm for improving the Text Clustering. In: 2016 7th International Conference on Computer Science and Information Technology (CSIT). IEEE (2016)

    Google Scholar 

  11. Ong, Y., Lim, M., Zhu, N., Wong, K.: Classification of adaptive memetic algorithms: a comparative study. IEEE Trans. Syst. Man Cybern. Part B: Cybern. 36(1), 141–152 (2006)

    Article  Google Scholar 

  12. Aarts, E., Laarhoven, P.V.: Simulated annealing: an introduction. Stat. Neerl. 43(1), 31–52 (1989)

    Article  MATH  MathSciNet  Google Scholar 

  13. Mirjalili, S., Lewis, A.: The whale optimization algorithm. Adv. Eng. Softw. 95, 51–67 (2016)

    Article  Google Scholar 

  14. Mafarja, M.M., Mirjalili, S.: Hybrid whale optimization algorithm with simulated annealing for feature selection. Neurocomputing 260, 302–312 (2017)

    Article  Google Scholar 

  15. Lee, J., Kim, D.-W.: Memetic feature selection algorithm for multi-label classification. Inf. Sci. 293, 80–96 (2015)

    Article  Google Scholar 

  16. Uysal, A.K., Gunal, S.: The impact of preprocessing on text classification. Inf. Process. Manag. 50(1), 104–112 (2014)

    Article  Google Scholar 

  17. Hartigan, J.A., Wong, M.A.: Algorithm AS 136 a k-means clustering algorithm. Appl. Stat. 28, 100–108 (1979)

    Article  MATH  Google Scholar 

  18. Duwairi, R., Abu-Rahmeh, M.: A novel approach for initializing the spherical k-means clustering algorithm. Simul. Model. Pract. Theory 54, 49–63 (2015)

    Article  Google Scholar 

Download references

Acknowledgment

Ibraheem wants to thank the Higher Committee for Education Development in Iraq (HCED) for the funning of his scholarship.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ibraheem Al-Jadir .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Al-Jadir, I., Wong, K.W., Fung, C.C., Xie, H. (2017). Text Dimensionality Reduction for Document Clustering Using Hybrid Memetic Feature Selection. In: Phon-Amnuaisuk, S., Ang, SP., Lee, SY. (eds) Multi-disciplinary Trends in Artificial Intelligence. MIWAI 2017. Lecture Notes in Computer Science(), vol 10607. Springer, Cham. https://doi.org/10.1007/978-3-319-69456-6_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-69456-6_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-69455-9

  • Online ISBN: 978-3-319-69456-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics