Abstract
We discuss a classification-based approach for filtering phishing messages in an e-mail stream. Upon arrival, various features of every e-mail are extracted. This forms the basis of a classification process which detects potentially harmful phishing messages. We introduce various new features for identifying phishing messages and rank established as well as newly introduced features according to their significance for this classification problem. Moreover, in contrast to classical binary classification approaches (spam vs. not spam), a more refined ternary classification approach for filtering e-mail data is investigated which automatically distinguishes three message types: ham (solicited e-mail), spam, and phishing.
Experiments with representative data sets illustrate that our approach yields better classification results than existing phishing detection methods. Moreover, the direct ternary classification proposed is compared to a sequence of two binary classification processes. Direct one-step ternary classification is not only more efficient, but is also shown to achieve better accuracy than repeated binary classification.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Anti Phishing Work Group. Phishing attacks trends report (December 2007), http://www.antiphishing.org
Inomata, A., Rahman, S.M.M., Okamoto, T., Okamoto, E.: A novel mail filtering method against phishing. Japan Science and Technology agency, Research Institute of Science and Technology for Society (2005)
Drake, C.E., Oliver, J.J., Koontz, E.J.: Anatomy of a Phishing Email. In: Conference on E-mail and Anti-Spam, 1841 Page Mill Road, Palo Alto, CA 94304, USA. MailFrontier, Inc. (2004)
Fette, I., Sadeh, N., Tomasic, A.: Learning to detect phishing emails. In: WWW 2007: Proceedings of the 16th international conference on World Wide Web, pp. 649–656. ACM, New York (2007)
Gansterer, W.N., Janecek, A.G.K., Neumayer, R.: Spam filtering based on latent semantic indexing. In: Berry, M.W., Castellanos, M. (eds.) Survey of Text Mining II: Clustering, Classification, and Retrieval, pp. 165–183. Springer, Heidelberg (2008)
Jakobsson, M., Ratkiewicz, J.: Designing ethical phishing experiments: a study of (rot13) ronl query features. In: Carr, L., Roure, D.D., Iyengar, A., Goble, C.A., Dahlin, M. (eds.) World Wide Web Conference, pp. 513–522. ACM, New York (2006)
Janecek, A.G.K., Gansterer, W.N., Kumar, K.A.: Multi-level reputation-based greylisting. In: Proceedings of ARES 2008 – International Conference on Availability, Reliability and Security, pp. 10–17. IEEE Computer Society, Los Alamitos (2008)
Kirda, E., Kruegel, C.: Protecting users against phishing attacks with antiphish. In: 29th Annual International Computer Software and Applications Conference, vol. 1, pp. 517–524 (2005)
Kumaraguru, P., Rhee, Y., Acquisti, A., Cranor, L.F., Hong, J., Nunge, E.: Protecting people from phishing: the design and evaluation of an embedded training email system. In: CHI 2007: Proceedings of the SIGCHI conference on Human factors in computing systems, pp. 905–914. ACM, New York (2007)
Liu, W., Deng, X., Huang, G., Fu, A.Y.: An antiphishing strategy based on visual similarity assessment. IEEE Internet Computing 10(2), 58–65 (2006)
Lynam, T.R., Cormack, G.V., Cheriton, D.R.: On-line spam filter fusion. In: SIGIR 2006: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval, pp. 123–130. ACM Press, New York (2006)
Chandrasekaran, M., Chinchani, R., Upadhyaya, S.: Phoney: Mimicking user response to detect phishing attacks. In: International Symposium on a World of Wireless, Mobile and Multimedia Networks, pp. 668–672 (2006)
Miyamoto, D., Hazeyama, H., Kadobayashi, Y.: SPS: A simple filtering algorithm to thwart phishing attacks. In: Cho, K., Jacquet, P. (eds.) AINTEC 2005. LNCS, vol. 3837, pp. 195–209. Springer, Heidelberg (2005)
Quinlan, J.R.: C4.5: Programms for Machine Learning. Morgan Kaufmann, San Mateo (1993)
Schechter, S.E., Dhamija, R., Ozment, A., Fischer, I.: The emperor’s new security indicators. In: IEEE Symposium on Security and Privacy, pp. 51–65. IEEE Computer Society, Los Alamitos (2007)
Sheng, S., Magnien, B., Kumaraguru, P., Acquisti, A., Cranor, L.F., Hong, J., Nunge, E.: Anti-phishing phil: the design and evaluation of a game that teaches people not to fall for phish. In: SOUPS 2007: Proceedings of the 3rd symposium on Usable privacy and security, pp. 88–99. ACM, New York (2007)
Wu, M., Miller, R.C., Garfinkel, S.L.: Do security toolbars actually prevent phishing attacks? In: CHI 2006: Proceedings of the SIGCHI conference on Human Factors in computing systems, pp. 601–610. ACM, New York (2006)
Wu, M., Miller, R.C., Little, G.: Web wallet: preventing phishing attacks by revealing user intentions. In: SOUPS 2006: Proceedings of the second symposium on Usable privacy and security, pp. 102–113. ACM Press, New York (2006)
Zhang, Y., Hong, J.I., Cranor, L.F.: Cantina: a content-based approach to detecting phishing web sites. In: WWW 2007: Proceedings of the 16th international conference on World Wide Web, pp. 639–648. ACM, New York (2007)
Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, San Francisco (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gansterer, W.N., Pölz, D. (2009). E-Mail Classification for Phishing Defense. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds) Advances in Information Retrieval. ECIR 2009. Lecture Notes in Computer Science, vol 5478. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00958-7_40
Download citation
DOI: https://doi.org/10.1007/978-3-642-00958-7_40
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00957-0
Online ISBN: 978-3-642-00958-7
eBook Packages: Computer ScienceComputer Science (R0)