Summary
Email filtering is a cost-sensitive task, because missing a legitimate message is more harmful than the opposite error. Therefore, how to evaluate the error risk of a filter which is trained from a given labeled dataset is significant for this task. This paper surveys the researches on the Receiver Operation Characteristic (ROC) curve analysis. And, with the experimental results of four compared filters on four public available corpus, we discuss how to use the techniques of ROC curve analysis to evaluate the risk of email filters. In our view, this work is useful for designing a bread-and-butter filter.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Androutsopoulos, I., Georgios, P. and Michelakis, E. “Learning to filter unsolicited commercial e-mail”. Technical Report 2004/2, NCSR Demokritos00, (2004).
Androutsopoulos, I., Koutsias, J., Chandrinos, K.V., Paliouras, G. and Spyropoulos, C.D. “An evaluation of naive Bayesian anti-spam filtering”. In: Proc. of the Workshop on Machine Learning in the New Information Age, 11th European Conference on Machine Learning, (2000) 9-17.
Androutsopoulos, I., Koutsias, J., Chandrinos, K.V. and Spyropoulos, C.D. “An experimental comparison of naive Bayesian and keyword-based anti-spam filtering with personal e-mail messages”. In: Proc. of the 23rd ACM SIGIR Conference on Research and Development in Information Retrieval, (2000) 160-167.
Breiman, L. “Bagging predictors”. Machine Learning, 24(2) (1996) 123-140.
Duda, R.O. and Hart, P.E. Pattern Classification and Scene Analysis. (1973).
Freund, Y. “Boosting a weak algorithm by majority”. Information and Computation, 121(2) (1995) 256-285.
Hanley, J.A and Mcneil, B.J. “The meaning and use of the area under a ROC curve”. Radiology, (143) (1982) 29-36.
Jos, M.G.H., Manuel, M.L. and Enrique, P.S. “Combining text and heuristics for cost-sensitive spam filtering”. In: Proc. of the 2nd Workshop on Learning Language in Logic and the 4th Conference on Computational Natural Language Learning, (7) (2000) 99-102.
Li, W.B., Liu, C.N. and Chen, Y.Y. “Combining multiple email filters of naive Bayes based on GMM”. ACTA ELECTRONICA SINICA, 34(2) (2006) 247-251.
Li, W.B., Zhong, N. and Liu, C.N. “Combining multiple email filters based on multivariate statistical analysis”. In: Proc. of the 15th International Symposium on Methodologies for Intelligent Systems, (2006) 729-738.
Li, W.B., Zhong, N. and Liu, C.N. “Design and implementation of an email classifier”. In: Proc. of International Conference on Active Media Technology, (2003) 423-430.
McCallum, A. and Nigam, K. “A comparison of event models for naive Bayes text classification”. In: Proc. of AAAI-98 Workshop on Learning for Text Categorization, (1998) 41-48.
Peter, A.F. “The many faces of ROC analysis in machine learning”. In: Proc. of The Twenty-First International Conference on Machine Learning, (2004).
Segal, R., Crawford, J., Kephart, J. and Leiba, B. “SpamGuru: an enterprise anti-spam filtering system”. In: Proc. of the First Conference on Email and Anti-Spam, (2004).
Salton, G. Automatic text processing: the transformation, analysis, and retrieval of information by computer. (1989).
Sebastiani, F. “Machine learning in automated text categorization”. ACM Computing Surveys, 34(1) (2002) 1-47.
Yang, Y. and Pedersen, J.O. “A comparative study on feature selection in text categorization”. In: Proc. of 14th International Conference on Machine Learning, (1997) 412-420.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Li, W., Zhong, N., Liu, C. (2008). Evaluating the Error Risk of Email Filters Based on ROC Curve Analysis. In: Iwata, S., Ohsawa, Y., Tsumoto, S., Zhong, N., Shi, Y., Magnani, L. (eds) Communications and Discoveries from Multidisciplinary Data. Studies in Computational Intelligence, vol 123. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78733-4_18
Download citation
DOI: https://doi.org/10.1007/978-3-540-78733-4_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78732-7
Online ISBN: 978-3-540-78733-4
eBook Packages: EngineeringEngineering (R0)