Abstract
Email categorization becomes very popular today in personal information management. However, most n-way classification methods suffer from feature unevenness problem, namely, features learned from training samples distribute unevenly in various folders. We argue that the binarization approaches can handle this problem effectively. In this paper, three binarization techniques are implemented, i.e. one-against-rest, one-against-one and some-against-rest, using two assembling techniques, i.e. round robin and elimination. Experiments on email categorization prove that significant improvement has been achieved in these binarization approaches over an n-way baseline classifier.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Bekkerman, R., McCallum, A., Huang, G.: Automatic Categorization of Email into Folders: Benchmark Experiments on Enron and SRI Corpora. UMass CIIR Technical Report IR-418 (2004)
Berger, A.: Error-correcting output coding for text classification. In: IJCAI 1999 Workshop on machine learning for information filtering (1999)
Cohen, W.: Learning Rules that Classify E-Mail. In: Proc. AAAI Spring Symposium on Machine Learning in Information Access, Stanford, California (1996)
Fisher, D., Moody, P.: Studies of Automated Collection of Email Records. University of California, Irvine, Technical Report UCI-ISR-02-4 (2001)
Furnkranz, J.: Round robin classification. Journal of Machine Learning Research 2, 721–747 (2002)
Hastie, T., Tibshirani, R.: Classification by pairwise coupling. In: Jordan, M.I., Kearns, M.J., Solla, S.A. (eds.) Advances in Neural Information Processing Systems 10 (NIPS 1997), pp. 507–513. MIT Press, Cambridge (1998)
Joachims, T.: Learning to Classify Text Using Support Vector Machines, Methods, Theory, and Algorithms. Kluwer, Dordrecht (2002)
Yang, Y., Klimt, B.: The Enron Corpus: A New Dataset for Email Classification Research. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 217–226. Springer, Heidelberg (2004)
Manco, G., Masciari, E., Rurolo, M., Tagarelli, A.: Towards an adaptive mail classifier. In: Proc. AIIA 2002 (2002)
Schwenker, F.: Hierarchical support vector machines for multi-class pattern recognition. In: Proc. IEEE KES 2000, vol. 2, pp. 561–565 (2000)
Xia, Y., Dalli, A., Wilks, Y., Guthrie, L.: FASiL Adaptive Email Categorization System. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 723–734. Springer, Heidelberg (2005)
Yang, Y.: An evaluation of statistical approaches to text categorization. Journal IR 1(1/2), 67–88 (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Xia, Y., Wong, KF. (2006). Binarization Approaches to Email Categorization. In: Matsumoto, Y., Sproat, R.W., Wong, KF., Zhang, M. (eds) Computer Processing of Oriental Languages. Beyond the Orient: The Research Challenges Ahead. ICCPOL 2006. Lecture Notes in Computer Science(), vol 4285. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11940098_50
Download citation
DOI: https://doi.org/10.1007/11940098_50
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49667-0
Online ISBN: 978-3-540-49668-7
eBook Packages: Computer ScienceComputer Science (R0)