Skip to main content

Evaluating the Error Risk of Email Filters Based on ROC Curve Analysis

  • Chapter
Communications and Discoveries from Multidisciplinary Data

Part of the book series: Studies in Computational Intelligence ((SCI,volume 123))

  • 442 Accesses

Summary

Email filtering is a cost-sensitive task, because missing a legitimate message is more harmful than the opposite error. Therefore, how to evaluate the error risk of a filter which is trained from a given labeled dataset is significant for this task. This paper surveys the researches on the Receiver Operation Characteristic (ROC) curve analysis. And, with the experimental results of four compared filters on four public available corpus, we discuss how to use the techniques of ROC curve analysis to evaluate the risk of email filters. In our view, this work is useful for designing a bread-and-butter filter.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Androutsopoulos, I., Georgios, P. and Michelakis, E. “Learning to filter unsolicited commercial e-mail”. Technical Report 2004/2, NCSR Demokritos00, (2004).

    Google Scholar 

  2. Androutsopoulos, I., Koutsias, J., Chandrinos, K.V., Paliouras, G. and Spyropoulos, C.D. “An evaluation of naive Bayesian anti-spam filtering”. In: Proc. of the Workshop on Machine Learning in the New Information Age, 11th European Conference on Machine Learning, (2000) 9-17.

    Google Scholar 

  3. Androutsopoulos, I., Koutsias, J., Chandrinos, K.V. and Spyropoulos, C.D. “An experimental comparison of naive Bayesian and keyword-based anti-spam filtering with personal e-mail messages”. In: Proc. of the 23rd ACM SIGIR Conference on Research and Development in Information Retrieval, (2000) 160-167.

    Google Scholar 

  4. Breiman, L. “Bagging predictors”. Machine Learning, 24(2) (1996) 123-140.

    MATH  MathSciNet  Google Scholar 

  5. Duda, R.O. and Hart, P.E. Pattern Classification and Scene Analysis. (1973).

    Google Scholar 

  6. Freund, Y. “Boosting a weak algorithm by majority”. Information and Computation, 121(2) (1995) 256-285.

    Article  MATH  MathSciNet  Google Scholar 

  7. Hanley, J.A and Mcneil, B.J. “The meaning and use of the area under a ROC curve”. Radiology, (143) (1982) 29-36.

    Google Scholar 

  8. Jos, M.G.H., Manuel, M.L. and Enrique, P.S. “Combining text and heuristics for cost-sensitive spam filtering”. In: Proc. of the 2nd Workshop on Learning Language in Logic and the 4th Conference on Computational Natural Language Learning, (7) (2000) 99-102.

    Google Scholar 

  9. Li, W.B., Liu, C.N. and Chen, Y.Y. “Combining multiple email filters of naive Bayes based on GMM”. ACTA ELECTRONICA SINICA, 34(2) (2006) 247-251.

    Google Scholar 

  10. Li, W.B., Zhong, N. and Liu, C.N. “Combining multiple email filters based on multivariate statistical analysis”. In: Proc. of the 15th International Symposium on Methodologies for Intelligent Systems, (2006) 729-738.

    Google Scholar 

  11. Li, W.B., Zhong, N. and Liu, C.N. “Design and implementation of an email classifier”. In: Proc. of International Conference on Active Media Technology, (2003) 423-430.

    Google Scholar 

  12. McCallum, A. and Nigam, K. “A comparison of event models for naive Bayes text classification”. In: Proc. of AAAI-98 Workshop on Learning for Text Categorization, (1998) 41-48.

    Google Scholar 

  13. Peter, A.F. “The many faces of ROC analysis in machine learning”. In: Proc. of The Twenty-First International Conference on Machine Learning, (2004).

    Google Scholar 

  14. Segal, R., Crawford, J., Kephart, J. and Leiba, B. “SpamGuru: an enterprise anti-spam filtering system”. In: Proc. of the First Conference on Email and Anti-Spam, (2004).

    Google Scholar 

  15. Salton, G. Automatic text processing: the transformation, analysis, and retrieval of information by computer. (1989).

    Google Scholar 

  16. Sebastiani, F. “Machine learning in automated text categorization”. ACM Computing Surveys, 34(1) (2002) 1-47.

    Article  Google Scholar 

  17. Yang, Y. and Pedersen, J.O. “A comparative study on feature selection in text categorization”. In: Proc. of 14th International Conference on Machine Learning, (1997) 412-420.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Li, W., Zhong, N., Liu, C. (2008). Evaluating the Error Risk of Email Filters Based on ROC Curve Analysis. In: Iwata, S., Ohsawa, Y., Tsumoto, S., Zhong, N., Shi, Y., Magnani, L. (eds) Communications and Discoveries from Multidisciplinary Data. Studies in Computational Intelligence, vol 123. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78733-4_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-78733-4_18

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-78732-7

  • Online ISBN: 978-3-540-78733-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics