Abstract
In this paper, we study the email classification problem. We apply the notion of shingling to capture the concept of phrases. For each email, we form a sketch which is compact in size and the sketch of two emails allows for computation of their resemblance. We then apply a k-nearest neighbour algorithm to classify the emails. Experimental evaluation shows that a high degree of accuracy can be obtained.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Broder, A.Z.: On the resemblance and containment of documents. In: Compression and Complexity of Sequences (SEQUENCES 1997), pp. 21–29. IEEE Computer Society, Los Alamitos (1997)
Brutlag, C., Meek, J.: Challenges of the email domain for text classification. In: 17th International Conference on Machine Learning, July 2000, pp. 103–110 (2000)
Cohen, W.W.: Learning rules that classify e-mail. pp. 18–25 (1996)
Helfman, J., Isbell, C.: Ishmail: immediate identification of important information. Technical report, AT&T Labs (1995)
Itskevitch, J.: Automatic hierarchical e-mail classification using association rules. Master’s thesis, Simon Fraser University (2001)
Manco, G., Masciari, E., Ruffolo, M., Tagarelli, A.: Towards an adaptive mail classifier (2002)
Payne, T., Edwards, P.: Interface agents that learn: an investigation of learning issues in a mail agent interface. Applied Artificial Intelligence 11, 1–32 (1997)
Rabin, M.O.: Fingerprint by random polynomials. Technical Report TR-15-81, Center for Research in Computing Technology, Harvard University (1981)
Rennie, J.: ifile: an application of machine learning to e-mail filtering. In: Proceedings of the KDD 2000 Workshop on Text Mining (1995)
Segal, B.R., Kephart, J.O.: Mailcat: an intelligent assistant for organizing email. In: Proc. of the 3rd International Conference on Autonomous Agents, pp. 276– 282 (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Poon, C.K., Chang, M. (2003). An Email Classifier Based on Resemblance. In: Zhong, N., Raś, Z.W., Tsumoto, S., Suzuki, E. (eds) Foundations of Intelligent Systems. ISMIS 2003. Lecture Notes in Computer Science(), vol 2871. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39592-8_48
Download citation
DOI: https://doi.org/10.1007/978-3-540-39592-8_48
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20256-1
Online ISBN: 978-3-540-39592-8
eBook Packages: Springer Book Archive