Skip to main content

A Survey on Filter Techniques for Feature Selection in Text Mining

  • Conference paper
  • First Online:

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 236))

Abstract

A large portion of a document is usually covered by irrelevant features. Instead of identifying actual context of the document, such features increase dimensions in the representation model and computational complexity of underlying algorithm, and hence adversely affect the performance. It necessitates a requirement of relevant feature selection in the given feature space. In this context, feature selection plays a key role in removing irrelevant features from the original feature space. Feature selection methods are broadly categorized into three groups: filter, wrapper, and embedded. Filter methods are widely used in text mining because of their simplicity, computational complexity, and efficiency. In this article, we provide a brief survey of filter feature selection methods along with some of the recent developments in this area.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   259.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Chen, J., Huang, H., Tian, S., Qu, Y.: Feature selection for text classification with Naïve Bayes. Expert Syst. Appl. 36(3), 5432–5435 (2009)

    Google Scholar 

  2. Chen, X.: An improved branch and bound algorithm for feature selection. Pattern Recogn. Lett. 24(12), 1925–1933 (2003)

    Google Scholar 

  3. Chuang, L.Y., Tsai, S.W., Yang, C.H.: Improved binary particle swarm optimization using catfish effect for feature selection. Expert Syst. Appl. 38(10), 12699–12707 (2011)

    Google Scholar 

  4. Chuang, L.Y., Yang, C.H., Wu, K.C., Yang, C.H.: A hybrid feature selection method for DNA microarray data. Comput. Biol. Med. 41(4), 228–237 (2011)

    Google Scholar 

  5. Church, K.W., Hanks, P.: Word association norm, mutual information and lexicography. J. Comput. Linguist. 27(1), 22–29 (1990)

    Google Scholar 

  6. Deerwester, S.: Improving information retrieval with latent semantic indexing. In: Proceedings of the 51st Annual Meeting of the American Society for Information Science, Vol. 25, pp. 36–40 (1988)

    Google Scholar 

  7. Ding, C., Peng, H.: Minimum redundancy feature selection from microarray gene expression data. J. Bioinf. Comput. Biol. 185–205 (2005)

    Google Scholar 

  8. Ferreira, A.J., Figueired, M.A.T.: Efficient feature selection filters for high-dimensional data. Pattern Recogn. Lett. 33(13), 1794–1804 (2012)

    Google Scholar 

  9. Hall, M.A.: Correlation-based feature selection for machine learning. Ph.D. Thesis. Department of Computer Science, University of Waikato (1999)

    Google Scholar 

  10. Hsu, H.H., Hsieh, C. W., Lu, M.D.: Hybrid feature selection by combining filters and wrappers. Expert Syst. Appl. 38(7), 8144–8150 (2011)

    Google Scholar 

  11. Li, B., Zhang, P., Ren, G., Xing, Z.: A two stage feature selection method for gear fault diagnosis using reliefF and GA-wrapper. In: Proceedings International Conference on Measuring Technology and Mechatronics Automation, pp. 578–581 (2009)

    Google Scholar 

  12. Liu, L., Kang, J., Yu, J., Wang, Z.: A comparative study on unsupervised feature selection methods for text clustering. In: Proceedings of Natural Language Processing and Knowledge, Engineering, pp. 59–601 (2005)

    Google Scholar 

  13. Liu, Y., Qin, Z., Xu, Z., He, X.: Feature selection with particle swarms. In: Computational and Information Science, pp. 425–430. Springer, Heidelberg (2004)

    Google Scholar 

  14. Liu, Y., Wang, G., Chen, H., Dong, H., Zhu, X., Wang, S.: An improved particle swarm optimization for feature selection. J. Bionic Eng. 8(2), 191–200 (2011)

    Google Scholar 

  15. Meng, J., Lin, H., Yu, Y.: A two-stage feature selection method for text categorization. Knowl.-Based Syst. 62(7), 2793–2800 (2011)

    Google Scholar 

  16. Mitra, P., Murthy, C., Pal, S.: Unsupervised feature selection using feature similarity. IEEE Trans. Pattern Anal. Machine Intell. 24(3), 301–312 (2002)

    Google Scholar 

  17. Ng, H. T., Goh, W. B., Low, K. L.: Feature selection, perception learning, and a usability case study for text categorization. In: Proceedings of the 20th ACM International Conference on Research and Development in, Information Retrieval, pp. 67–73 (1997)

    Google Scholar 

  18. Pearson, K.: On lines and planes of closest filt to systems of points in space. Phil. Mag. 1(6), 559–572 (1901)

    Google Scholar 

  19. Peng, H., Long, F., Ding, C.: Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005)

    Google Scholar 

  20. Pudil, P., Novoviciva, J., Kittler, J.: Floating search methods in feature selection. Pattern Recogn. Lett. 15(11), 1119–1125 (1994)

    Google Scholar 

  21. Quinlan, J.R.: Induction of decision tree. Mach. learn. 1(1), 81–106 (1986)

    Google Scholar 

  22. Salton, G., Wong, A., Yang, C. S.: A vector space model for automatic indexing. Commun. ACM18(11), 613–620 (1975)

    Google Scholar 

  23. Shang, W., Huang, H., Zhu, H., Lin, Y., Qu, Y., Wang, Z.: A novel feature selection algorithm for text clustering. Expert Syst. Appl. 33(1), 1–5 (2007)

    Google Scholar 

  24. Shevade, S., Keerthi, S.: A simple and efficient algorithm for gene selection using sparse logistic regression. Bioinformatics 19(17), 2246–2253 (2003)

    Google Scholar 

  25. Song, W., Park, S.C.: Genetic algorithm for text clustering based on latent semantic indexing. Comput. Math. Appl. 57(11–12), 1901–1907 (2009)

    Google Scholar 

  26. Tu, C.J., Chuang, L.Y., Chang, J.Y., Yang, C.H.: Feature selection using PSO-SVM. In: Proceedings of Multiconferenc of Engineers, pp. 138–143 (2006)

    Google Scholar 

  27. Uguz, H.: A hybrid system based on information gain and principal component analysis for the classification of transcranial Doppler signals. Comput. Methods Programs Biomed. 107(3), 598–609 (2012)

    Google Scholar 

  28. Uguz, H.: A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm. Knowl. Based. Syst. 24(7), 1024–1032 (2011)

    Google Scholar 

  29. Unler, A., Murat, A., Chinnam, R.B.: \(\text{ mr }^{2}\text{ PSO }\): A maximum relevance minimum redundancy feature selection method based on swarm intelligence for support vector machine classification. Inf. Sci. 181(20), 4625–4641 (2011)

    Google Scholar 

  30. Yang, C.H., Chuang, L.Y., Yang, C.H.: IG-GA: a hybrid filter/wrapper method for feature selection of microarray data. J. Med. Biol. Eng. 30(1), 23–28 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kusum Kumari Bharti .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer India

About this paper

Cite this paper

Bharti, K.K., Singh, P.k. (2014). A Survey on Filter Techniques for Feature Selection in Text Mining. In: Babu, B., et al. Proceedings of the Second International Conference on Soft Computing for Problem Solving (SocProS 2012), December 28-30, 2012. Advances in Intelligent Systems and Computing, vol 236. Springer, New Delhi. https://doi.org/10.1007/978-81-322-1602-5_154

Download citation

  • DOI: https://doi.org/10.1007/978-81-322-1602-5_154

  • Published:

  • Publisher Name: Springer, New Delhi

  • Print ISBN: 978-81-322-1601-8

  • Online ISBN: 978-81-322-1602-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics