Assessing the severity of phishing attacks: A hybrid data mining approach

https://doi.org/10.1016/j.dss.2010.08.020Get rights and content

Abstract

Phishing is an online crime that increasingly plagues firms and their consumers. We assess the severity of phishing attacks in terms of their risk levels and the potential loss in market value suffered by the targeted firms. We analyze 1030 phishing alerts released on a public database as well as financial data related to the targeted firms using a hybrid method that predicts the severity of the attack with up to 89% accuracy using text phrase extraction and supervised classification. Our research identifies some important textual and financial variables that impact the severity of the attacks and potential financial loss.

Research highlights

► Risk level and CAR level were complementary severity measures for phishing attacks. ► Using textual data and financial data make the severity prediction more accurate. ► Data mining techniques are effective to assess severity of phishing attacks.

Introduction

Phishing is a major security threat to the online community. It is a kind of identity theft that makes use of social engineering skills and technical subterfuge to entice the unsuspecting online consumer to give away their personal information and financial credentials [5]. A typical phishing attack consists of four phases, namely, preparation, mass broadcast, mature, and account hijack [8]. The tremendous financial impact of phishing is borne by the fact that phishing caused an estimated financial loss of US $3.2 billion affecting 3.6 million people from September 2006 to August 2007 [40]. The number of reported phishing incidents grew exponentially, and increased by 293.7% from 8829 in December 2004 to 34,758 in October 2008 [4], [5]. Not only do phishing attacks cause financial loss, but they also shatter the confidence of customers in conducting e-commerce. Managers of some of the US super regional banks have indicated that the deteriorating customer trust is a major concern with respect to phishing [46]. A recent survey found that most customers of European banks only use online banking to check their account balances instead of conducting online transactions due to the fear of getting phished [15]. Another study also reported that the customer fear psychosis has resulted in a 20% decrease in the rate of opening of genuine emails [10].

To make customers aware of latest phishing attacks, some international organizations and government statutory bodies, such as the Anti-phishing Working Group (APWG), have published phishing alerts on their websites. To assess the risk level of each phishing attack, some firms have sought help from information security experts who evaluated reported phishing incidents based on the contents of the phishing email and the phishing websites. However, as phishing incidents continue to increase at a tremendous rate, the manual risk assessment method involving experts may be too slow. Data mining techniques can improve the assessment of phishing attacks. They can discover the knowledge embedded in the traits of prior phishing attacks and identify the inherent characteristics that contribute to the different risk levels of a phishing attack. This can help predict the associated risk level of a new phishing incident in a short period of time with a reasonable accuracy. Furthermore, the risk level, which is based on the technical sophistication of phishing attacks, may not be directly related to financial loss caused by an attack. Past research has shown that the impact of sophisticated phishing alerts on stock markets is not as significant as phishing alerts whose risk level is considered to be moderate [33]. However, the financial loss resulting from a phishing attack is always of great concern to security administrators as well as consumers of an organization. Therefore, a warning mechanism that can identify the phishing incidents that are either very risky or likely to cause a large financial loss will be of great interest to shareholders and senior managers of the targeted companies.

In this research we use supervised classification techniques, which is a major stream of data mining, to assess the severity of phishing attacks. At the same time, we identify the key antecedents that contribute to a high risk level or a high financial loss generation by a phishing attack. We use a hybrid approach which combines key phrase extraction and supervised classification methods that makes use of the textual data description of the phishing attack as well as financial data of the targeted company to assess the severity of a phishing attack according to its risk level or financial loss generating potential. The three classifiers used for this purpose result in a classification accuracy of up to 89%. Our results also show that the key identifying variables for risk level and potential financial loss of phishing attacks are different from each other. High risk level is associated with phishing emails that ask customers of large firms to update their accounts whereas high financial loss is characterized by phishing attacks targeted to customers of large firms that have high total liabilities.

Section snippets

Literature review

Phishing has aroused great interest among information security researchers. Understanding the critical success factors of phishing and determining methods that can prevent or detect such a crime has been a popular area of research. We can roughly split current research on phishing into three streams, namely, phenomenal studies, economic analysis, and technical research.

As an example of a phenomenal study related to phishing, Jagatic et al. found that the social engineering skill of the

Theoretical framework

A theoretical model showing the impact of any type of threat on corporate performance was proposed by Crockford [13] and called the risk-components model. In this model, it was proposed that threats may compromise corporate resources, and thus negatively affect firm performance. This impact may be reflected as drop in earnings or market value of the firm. We adopted this model for assessment of the impact of security threats. Loch et al. categorized security threats according to their potential

Data collection and analysis

In this section we describe how we collect, prepare, and analyze phishing alerts to assess their severity, and determine important antecedents that influence the classification.

Results

In this section, the results obtained by applying the trained classification models on the validation data are presented. We evaluated the classification accuracy of the models, and then identified the important variables discovered by the models for the two classification tasks.

Discussion

Information security analysts generally assess the severity of phishing attacks on the basis of technical sophistication. The financial loss that is likely to result from phishing attacks is rarely estimated. However, senior managers and shareholders of companies are quite interested to know the economic impact of phishing attacks. Keeping in mind that it is important to evaluate the technical sophistication as well as the potential financial impact of phishing attacks, we conducted this

Conclusion

Phishing has become one of the biggest threats to the online community. Many researchers have explored ways to deter such crime. Information security specialists and anti-phishing organizations have set up phishing alerts databases that assess each reported phishing incident in terms of its risk level. In the view of increasing number of reported phishing incidents, we believe that such a manual assessment approach is not efficient enough to provide a timely report, and is also not complete as

Xi Chen is a lecturer of Information Systems at the School of Management, Zhejiang University, China. He obtained his BS (Management Information Systems) from Fudan University, MS (Information Systems) from the National University of Singapore, and Ph.D. (Information Systems) from the University of Hong Kong. His research interests are in the areas of data mining, mobile services, and churn management. His research has appeared or is forthcoming in Decision Support Systems, European Journal of

References (55)

  • Z. Ma et al.

    Discovering company revenue relations from news: a network approach

    Decision Support Systems

    (2009)
  • E. Airoldi et al.

    Data mining challenges for electronic safety: the case of fraudulent intent detection in e-mails

  • A.Q. Ansari et al.

    Integrated fuzzy logic and data mining: impact on cyber security

  • APWG
  • APWG
  • H. Berghel

    Phishing mongers, and posers

    Communications of the ACM

    (2006)
  • I. Bose et al.

    Unveiling the mask of phishing: threats, preventive measures, and responsibilities

    Communications of the Association for Information Systems

    (2007)
  • A. Brandt

    Phishing anxiety may make you miss messages

    PC World

    (2005)
  • C.J.C. Burges

    A tutorial on support vector machines for pattern recognition

    Data Mining and Knowledge Discovery

    (1998)
  • M. Chandrasekaran et al.

    Phishing e-mail detection based on structural properties

  • N. Crockford

    An Introduction to Risk Management

    (1986)
  • R. Dhamija et al.

    Phish, and HIPs: human interactive proofs to detect phishing attacks

  • B. Ensor et al.
  • R. Feldman et al.

    The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data

    (2007)
  • C. Fellbaum

    WordNet: An Electronic Lexical Database

    (1998)
  • I. Fette et al.

    Learning to detect phishing emails

  • W.N. Gansterer et al.

    E-mail classification for phishing defense, advances in information retrieval

  • Cited by (67)

    View all citing articles on Scopus

    Xi Chen is a lecturer of Information Systems at the School of Management, Zhejiang University, China. He obtained his BS (Management Information Systems) from Fudan University, MS (Information Systems) from the National University of Singapore, and Ph.D. (Information Systems) from the University of Hong Kong. His research interests are in the areas of data mining, mobile services, and churn management. His research has appeared or is forthcoming in Decision Support Systems, European Journal of Operational Research, Journal of Organizational Computing and Electronic Commerce, Journal of the American Society for Information Science and Technology, Electronic Commerce Research and Applications, and in the proceedings of several international conferences.

    Indranil Bose is an associate professor of Information Systems at the School of Business, The University of Hong Kong. He holds a B. Tech. from the Indian Institute of Technology, MS from the University of Iowa, MS and Ph.D. from Purdue University. His research interests are in telecommunications, data mining, information security, and supply chain management. His publications have appeared in Communications of the ACM, Communications of AIS, Computers and Operations Research, Decision Support Systems, Ergonomics, European Journal of Operational Research, Information & Management, Journal of Organizational Computing and Electronic Commerce, Journal of the American Society for Information Science and Technology, and Operations Research Letters. He is listed in the International Who's Who of Professionals 2005–2006, Marquis Who's Who in the World 2006, Marquis Who's Who in Asia 2007, Marquis Who's Who in Science and Engineering 2007, and Marquis Who's Who of Emerging Leaders 2007. He serves on the editorial board of Information & Management, Communications of AIS, and several other IS journals.

    Alvin Chung Man Leung is currently a Ph.D. student of McCombs School of Business, The University of Texas at Austin specializing in Information Systems. He has obtained his MPhil, BBA(IS), and BEng(SE) degrees from The University of Hong Kong. His research interests include social network, information security, and data mining. His publications have appeared in various international journals and conference proceedings such as Decision Support Systems, Communications of the ACM, Communications of the AIS, and International Conference on Information Systems (ICIS). He was also the recipient of Best Student Paper Award in International MultiConference of Engineers and Computer Scientists 2008.

    (Julian) Chenhui Guo studies as a Ph.D. student in the Department of MIS, Eller College of Management, The University of Arizona. He obtained his Bachelor of Business Administration degree from Zhejiang University, Hangzhou, China. His current research interests include data/text mining, business intelligence, and behavioral science in IS usage.

    View full text