Skip to main content
Log in

Higher order feature selection for text classification

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

In this paper. we present the MIFS-C variant of the mutual information feature-selection algorithms. We present an algorithm to find the optimal value of the redundancy parameter, which is a key parameter in the MIFS-type algorithms. Furthermore, we present an algorithm that speeds up the execution time of all the MIFS variants. Overall, the presented MIFS-C has comparable classification accuracy (in some cases even better) compared with other MIFS algorithms, while its running time is faster. We compared this feature selector with other feature selectors, and found that it performs better in most cases. The MIFS-C performed especially well for the breakeven and F-measure because the algorithm can be tuned to optimise these evaluation measures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Apté C, Damerau F, Weiss SM (1994) Automated learning of decision rules for text categorization. ACM Trans Inform Syst (TOIS) 12(3):233–251

    Google Scholar 

  2. Bakus J, Kamel M (2003) Information theoretic feature selection for document classification. In: Proceedings of the Eighth Canadian Workshop on Information Theory, pp 147–150. Waterloo, Ontario, Canada

  3. Battiti R (1994) Using mutual information for selecting features in supervised neural net learning. IEEE Trans Neural Netw 5(4):537–550

    Article  Google Scholar 

  4. Devijver PA, Kittler J (1982) Pattern Recognition: A Statistical Approach. Prentice Hall, Englewood Cliffs, NJ, USA

    Google Scholar 

  5. Domingos P, Pazzani M (1997) On the optimality of the simple Bayesian classifier under zero-one loss. Mach Learning 29 (2/3):103–130

    Google Scholar 

  6. Duda RO, Hart PE, Stork DG (eds) (2001) Pattern Classification, 2nd ed. Wiley, New York

    Google Scholar 

  7. Dumais ST, Platt J, Heckerman D, Sahami M (1998) Inductive learning algorithms and representations for text categorization. In: Proceedings of the Seventh International Conference on Information and Knowledge Management (CIKM-98), pp 148–155. Bethesda, MD, USA

  8. Fano RM (1961) Transmission of Information: A Statistical Theory of Communication. MIT Press, Cambridge, MA

    Google Scholar 

  9. Forman G (2003) An extensive empirical study of feature selection metrics for text classification. J Mach Learn Res 3:1289–1305

    Article  MATH  Google Scholar 

  10. Ghiselli EE (1964) Theory of Psychological Measurement. McGraw Hill, New York

    Google Scholar 

  11. Hall MA (2000) Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of the 16th International Conference on Machine Learning (ICML-99), pp 359–366. Stanford, CA, USA

  12. Joachims T (1997) A probabilistic analysis of the Rocchio algorithm with TFIDF for text categorization. In: Proceedings of the 14th International Conference on Machine Learning (ICML-97), pp 143–151. Nashville, TN, USA,

  13. Joachims T (2001) A statistical learning model for text classification with support vector machines. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-01), pp 128–136. New Orleans, LA, USA

  14. Ko Y, Seo J (2000) Automatic text categorization by unsupervised learning. In: Proceedings of the 18th International Conference on Computational Linguistics (COLING-00), pp 453–459. Saarbrücken, Germany

  15. Kohavi R, John G (1997) Wrappers for feature subset selection. Artif Intell J 97(1–2):273–324

    Google Scholar 

  16. Koller D, Sahami M (1996) Toward optimal feature selection. In: Proceedings of the 13th International Conference on Machine Learning (ICML-96), pp 170–178. Bari, Italy

  17. Kwak N, Choi C-H (1994) Input feature selection for classification problems. IEEE Trans Neural Netw 13(1):143–159

    Google Scholar 

  18. Lewis DD (1992) An evaluation of phrasal and clustered representations on a text categorization task. In: Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-92), pp 37–50. Copenhagen, Denmark

  19. Lewis DD (1992) Representation and learning in information retrieval. PhD Thesis, Department of Computer and Information Science, University of Massachusetts, Amherst, MA, USA

  20. Lewis DD, Schapire RE, Callan JP, Papka R (1996) Training algorithms for linear text classifiers. In: Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-96), pp 307–315. Zurich, Switzerland

  21. Liu H, Motoda H (1998) Feature Selection for Knowledge Discovery and Data Mining. Kluwer Academic Publishers, Dordrecht

    Google Scholar 

  22. McCallum A, Nigam K (1998) A comparison of event models for naive Bayes text classification. In: Proceedings of the 1998 AAAI/ICML Workshop on Learning for Text Categorization, pp 41–48. Madison, WI, USA

  23. Mladenić D, Grobelnik M (1998) Word sequences as features in text-learning. In: Proceedings of the Seventh Electrotechnical and Computer Science Conference (ERK-98), pp 145–148. Ljubljana, Slovenia

  24. Mladenić D, Grobelnik M (1999) Feature selection for unbalanced class distribution and naive Bayes. In: Proceedings of the 16th International Conference on Machine Learning (ICML-99), pp 258–267. Bled, Slovenia

  25. Press WH, Flannery BP, Teukolski SA, Vetterling WT (1988) Numerical Recipes in C. Cambridge University Press, Cambridge, UK

    Google Scholar 

  26. Quinlan R (1993) C4.5: Programs for Machine Learning. Morgan Kaufman, San Mateo, CA, USA

    Google Scholar 

  27. Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Inform Process Manage 24(5):513–523

    Google Scholar 

  28. Salton G, Yang C, Wong A (1975) A vector-space model for automatic indexing. Commun ACM 18(11):613–620

    Article  Google Scholar 

  29. Schütze H, Hull DA, Pedersen JO (1995) A comparison of classifiers and document representations for the routing problem. In: Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-95), pp 229–237. Seattle, WA, USA

  30. Sebastiani F (2002) Machine learning in automated text categorization. ACM Comput Surveys (CSUR) 34(1):1–47

    Google Scholar 

  31. Sidlecki W, Sklanski J (1988) On automatic feature selection. Int J Pattern Recogn Artif Intell 2(2):197–220

    Google Scholar 

  32. van Rijsbergen CJ, Harper DJ, Porter MF (1981) The selection of good search terms. Inform Process Manage 17(2):77–91

    Google Scholar 

  33. Vapnik V (1995) The Nature of Statistical Learning Theory. Springer, Berlin Heidelberg New York

    Google Scholar 

  34. Yang Y, Pedersen JP (1997) A comparative study on feature selection in text categorization. In: Proceedings of the 14th International Conference on Machine Learning (ICML-97), pp 412–420. Nashville, TN, USA

  35. Yang Y, Slattery S, Ghani R (2002) A study of approaches to hypertext categorization. J Intell Inform Syst 18(2)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jan Bakus.

Additional information

Jan Bakus received the B.A.Sc. and M.A.Sc. degrees in electrical engineering from the University of Waterloo, Waterloo, ON, Canada, in 1996 and 1998, respectively, and Ph.D. degree in systems design engineering in 2005. He is currently working at Maplesoft, Waterloo, ON, Canada as an applications engineer, where he is responsible for the development of application specific toolboxes for the Maple scientific computing software.

His research interests are in the area of feature selection for text classification, text classification, text clustering, and information retrieval. He is the recipient of the Carl Pollock Fellowship award from the University of Waterloo and the Datatel Scholars Foundation scholarship from Datatel.

Mohamed S. Kamel holds a Ph.D. in computer science from the University of Toronto, Canada. He is at present Professor and Director of the Pattern Analysis and Machine Intelligence Laboratory in the Department of Electrical and Computing Engineering, University of Waterloo, Canada. Professor Kamel holds a Canada Research Chair in Cooperative Intelligent Systems.

Dr. Kamel's research interests are in machine intelligence, neural networks and pattern recognition with applications in robotics and manufacturing. He has authored and coauthored over 200 papers in journals and conference proceedings, 2 patents and numerous technical and industrial project reports. Under his supervision, 53 Ph.D. and M.A.Sc. students have completed their degrees.

Dr. Kamel is a member of ACM, AAAI, CIPS and APEO and has been named s Fellow of IEEE (2005). He is the editor-in-chief of the International Journal of Robotics and Automation, Associate Editor of the IEEE SMC, Part A, the International Journal of Image and Graphics, Pattern Recognition Letters and is a member of the editorial board of the Intelligent Automation and Soft Computing. He has served as a consultant to many Companies, including NCR, IBM, Nortel, VRP and CSA. He is a member of the board of directors and cofounder of Virtek Vision International in Waterloo.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bakus, J., Kamel, M.S. Higher order feature selection for text classification. Knowl Inf Syst 9, 468–491 (2006). https://doi.org/10.1007/s10115-005-0209-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-005-0209-6

Navigation