Abstract
In this paper. we present the MIFS-C variant of the mutual information feature-selection algorithms. We present an algorithm to find the optimal value of the redundancy parameter, which is a key parameter in the MIFS-type algorithms. Furthermore, we present an algorithm that speeds up the execution time of all the MIFS variants. Overall, the presented MIFS-C has comparable classification accuracy (in some cases even better) compared with other MIFS algorithms, while its running time is faster. We compared this feature selector with other feature selectors, and found that it performs better in most cases. The MIFS-C performed especially well for the breakeven and F-measure because the algorithm can be tuned to optimise these evaluation measures.
Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.References
Apté C, Damerau F, Weiss SM (1994) Automated learning of decision rules for text categorization. ACM Trans Inform Syst (TOIS) 12(3):233–251
Bakus J, Kamel M (2003) Information theoretic feature selection for document classification. In: Proceedings of the Eighth Canadian Workshop on Information Theory, pp 147–150. Waterloo, Ontario, Canada
Battiti R (1994) Using mutual information for selecting features in supervised neural net learning. IEEE Trans Neural Netw 5(4):537–550
Devijver PA, Kittler J (1982) Pattern Recognition: A Statistical Approach. Prentice Hall, Englewood Cliffs, NJ, USA
Domingos P, Pazzani M (1997) On the optimality of the simple Bayesian classifier under zero-one loss. Mach Learning 29 (2/3):103–130
Duda RO, Hart PE, Stork DG (eds) (2001) Pattern Classification, 2nd ed. Wiley, New York
Dumais ST, Platt J, Heckerman D, Sahami M (1998) Inductive learning algorithms and representations for text categorization. In: Proceedings of the Seventh International Conference on Information and Knowledge Management (CIKM-98), pp 148–155. Bethesda, MD, USA
Fano RM (1961) Transmission of Information: A Statistical Theory of Communication. MIT Press, Cambridge, MA
Forman G (2003) An extensive empirical study of feature selection metrics for text classification. J Mach Learn Res 3:1289–1305
Ghiselli EE (1964) Theory of Psychological Measurement. McGraw Hill, New York
Hall MA (2000) Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of the 16th International Conference on Machine Learning (ICML-99), pp 359–366. Stanford, CA, USA
Joachims T (1997) A probabilistic analysis of the Rocchio algorithm with TFIDF for text categorization. In: Proceedings of the 14th International Conference on Machine Learning (ICML-97), pp 143–151. Nashville, TN, USA,
Joachims T (2001) A statistical learning model for text classification with support vector machines. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-01), pp 128–136. New Orleans, LA, USA
Ko Y, Seo J (2000) Automatic text categorization by unsupervised learning. In: Proceedings of the 18th International Conference on Computational Linguistics (COLING-00), pp 453–459. Saarbrücken, Germany
Kohavi R, John G (1997) Wrappers for feature subset selection. Artif Intell J 97(1–2):273–324
Koller D, Sahami M (1996) Toward optimal feature selection. In: Proceedings of the 13th International Conference on Machine Learning (ICML-96), pp 170–178. Bari, Italy
Kwak N, Choi C-H (1994) Input feature selection for classification problems. IEEE Trans Neural Netw 13(1):143–159
Lewis DD (1992) An evaluation of phrasal and clustered representations on a text categorization task. In: Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-92), pp 37–50. Copenhagen, Denmark
Lewis DD (1992) Representation and learning in information retrieval. PhD Thesis, Department of Computer and Information Science, University of Massachusetts, Amherst, MA, USA
Lewis DD, Schapire RE, Callan JP, Papka R (1996) Training algorithms for linear text classifiers. In: Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-96), pp 307–315. Zurich, Switzerland
Liu H, Motoda H (1998) Feature Selection for Knowledge Discovery and Data Mining. Kluwer Academic Publishers, Dordrecht
McCallum A, Nigam K (1998) A comparison of event models for naive Bayes text classification. In: Proceedings of the 1998 AAAI/ICML Workshop on Learning for Text Categorization, pp 41–48. Madison, WI, USA
Mladenić D, Grobelnik M (1998) Word sequences as features in text-learning. In: Proceedings of the Seventh Electrotechnical and Computer Science Conference (ERK-98), pp 145–148. Ljubljana, Slovenia
Mladenić D, Grobelnik M (1999) Feature selection for unbalanced class distribution and naive Bayes. In: Proceedings of the 16th International Conference on Machine Learning (ICML-99), pp 258–267. Bled, Slovenia
Press WH, Flannery BP, Teukolski SA, Vetterling WT (1988) Numerical Recipes in C. Cambridge University Press, Cambridge, UK
Quinlan R (1993) C4.5: Programs for Machine Learning. Morgan Kaufman, San Mateo, CA, USA
Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Inform Process Manage 24(5):513–523
Salton G, Yang C, Wong A (1975) A vector-space model for automatic indexing. Commun ACM 18(11):613–620
Schütze H, Hull DA, Pedersen JO (1995) A comparison of classifiers and document representations for the routing problem. In: Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-95), pp 229–237. Seattle, WA, USA
Sebastiani F (2002) Machine learning in automated text categorization. ACM Comput Surveys (CSUR) 34(1):1–47
Sidlecki W, Sklanski J (1988) On automatic feature selection. Int J Pattern Recogn Artif Intell 2(2):197–220
van Rijsbergen CJ, Harper DJ, Porter MF (1981) The selection of good search terms. Inform Process Manage 17(2):77–91
Vapnik V (1995) The Nature of Statistical Learning Theory. Springer, Berlin Heidelberg New York
Yang Y, Pedersen JP (1997) A comparative study on feature selection in text categorization. In: Proceedings of the 14th International Conference on Machine Learning (ICML-97), pp 412–420. Nashville, TN, USA
Yang Y, Slattery S, Ghani R (2002) A study of approaches to hypertext categorization. J Intell Inform Syst 18(2)
Author information
Authors and Affiliations
Corresponding author
Additional information
Jan Bakus received the B.A.Sc. and M.A.Sc. degrees in electrical engineering from the University of Waterloo, Waterloo, ON, Canada, in 1996 and 1998, respectively, and Ph.D. degree in systems design engineering in 2005. He is currently working at Maplesoft, Waterloo, ON, Canada as an applications engineer, where he is responsible for the development of application specific toolboxes for the Maple scientific computing software.
His research interests are in the area of feature selection for text classification, text classification, text clustering, and information retrieval. He is the recipient of the Carl Pollock Fellowship award from the University of Waterloo and the Datatel Scholars Foundation scholarship from Datatel.
Mohamed S. Kamel holds a Ph.D. in computer science from the University of Toronto, Canada. He is at present Professor and Director of the Pattern Analysis and Machine Intelligence Laboratory in the Department of Electrical and Computing Engineering, University of Waterloo, Canada. Professor Kamel holds a Canada Research Chair in Cooperative Intelligent Systems.
Dr. Kamel's research interests are in machine intelligence, neural networks and pattern recognition with applications in robotics and manufacturing. He has authored and coauthored over 200 papers in journals and conference proceedings, 2 patents and numerous technical and industrial project reports. Under his supervision, 53 Ph.D. and M.A.Sc. students have completed their degrees.
Dr. Kamel is a member of ACM, AAAI, CIPS and APEO and has been named s Fellow of IEEE (2005). He is the editor-in-chief of the International Journal of Robotics and Automation, Associate Editor of the IEEE SMC, Part A, the International Journal of Image and Graphics, Pattern Recognition Letters and is a member of the editorial board of the Intelligent Automation and Soft Computing. He has served as a consultant to many Companies, including NCR, IBM, Nortel, VRP and CSA. He is a member of the board of directors and cofounder of Virtek Vision International in Waterloo.
Rights and permissions
About this article
Cite this article
Bakus, J., Kamel, M.S. Higher order feature selection for text classification. Knowl Inf Syst 9, 468–491 (2006). https://doi.org/10.1007/s10115-005-0209-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-005-0209-6