Skip to main content

A Comparative Study of Classification Based Personal E-mail Filtering

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1805))

Abstract

This paper addresses personal E-mail filtering by casting it in the framework of text classification. Modeled as semi-structured documents, E-mail messages consist of a set of fields with predefined semantics and a number of variable length free-text fields. While most work on classification either concentrates on structured data or free text, the work in this paper deals with both of them. To perform classification, a naive Bayesian classifier was designed and implemented, and a decision tree based classifier was implemented. The design considerations and implementation issues are discussed. Using a relatively large amount of real personal E-mail data, a comprehensive comparative study was conducted using the two classifiers. The importance of different features is reported. Results of other issues related to building an effective personal E-mail classifier are presented and discussed. It is shown that both classifiers can perform filtering with reasonable accuracy. While the decision tree based classifier outperforms the Bayesian classifier when features and training size are selected optimally for both, a carefully designed naive Bayesian classifier is more robust.

The second author’s work is partially supported by a grant from the National 973 project of China (No. G1998030414).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. William W. Cohen: Learning Rules that Classify E-mail. In Proceedings of the 1996 AAAI Spring Symposium on Machine Learning in Information Access

    Google Scholar 

  2. W. W. Cohen, Y. Singer: Context-Sensitive Learning Methods for Text Categorization. In Proceedings of SIGIR-1996

    Google Scholar 

  3. M. Craven, D. DiPasquo, D. Freitag, A. McCallum, T. Mitchell, K. Nigam and S. Slattery: Learning to Extract Symbolic Knowledge from the World Wide Web. In Proceedings of the 15th National Conference on Artificial Intelligence (AAAI-98)

    Google Scholar 

  4. Fredrik Kilander: Properties of Electronic Texts for Classification Purposes as Suggested by Users. http://www.dsv.su.se/~fk/if_Doc/F25/essays.ps.Z

  5. D. D. Lewis: Naïve (Bayes) at Forty: The Independent Assumption in Information Retrieval. In European Conference on Machine Learning, 1998

    Google Scholar 

  6. D. D. Lewis, K. A. Knowles: Threading Electronic Mail: A Preliminary Study. In Information Processing and Management, 33(2): 209–217, 1997

    Article  Google Scholar 

  7. D. D. Lewis, M. Ringuette: A Comparison of Two Learning Algorithms for Text Categorization. In Third Annual Symposium on Document Analysis and Information Retrieval, pp. 81–93, Las Vegas, NV

    Google Scholar 

  8. Andrew McCallum and Kamal Nigam: A Comparison of Event Models for Naive Bayes Text Classification. Working notes of the 1998 AAAI/ICML workshop on Learning for Text Categorization

    Google Scholar 

  9. J. R. Quinlan: Induction of Decision Trees. Machine Learning, 1: 81–106, 1986

    Google Scholar 

  10. J. R. Quinlan: C4.5: Programs for Machine Learning. San Mateo, Calif.: Morgan Kaufmann Publishers, 1993

    Google Scholar 

  11. M. Sahami, S. Dumais, D. Heckerman, E. Horvitz: A Bayesian Approach to Filtering Junk E-mail. In Learning for Text Categorization: Papers from the 1998 workshop. AAAI Technical Report WS-98-05

    Google Scholar 

  12. Ellen Spertus: Smokey: Automatic Recognition of Hostile Messages. In Proceedings of Innovative Applications of Artificial Intelligence (IAAI) 1997

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Diao, Y., Lu, H., Wu, D. (2000). A Comparative Study of Classification Based Personal E-mail Filtering. In: Terano, T., Liu, H., Chen, A.L.P. (eds) Knowledge Discovery and Data Mining. Current Issues and New Applications. PAKDD 2000. Lecture Notes in Computer Science(), vol 1805. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45571-X_48

Download citation

  • DOI: https://doi.org/10.1007/3-540-45571-X_48

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-67382-8

  • Online ISBN: 978-3-540-45571-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics