Skip to main content

A Novel Machine Learning Methodology for Detecting Phishing Attacks in Real Time

  • Conference paper
  • First Online:
Security and Trust Management (STM 2020)

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 12386))

Included in the following conference series:

  • 467 Accesses

Abstract

Phishing is a cybercriminal activity where the criminal masquerades as a trusted entity and attacks the righteous users to gain personal information illegally. Many phishing detection techniques have been proposed in the past which use blacklist/whitelist, heuristic, search engine, visual similarity and machine learning. The statistics say that the average lifespan of any phishing website is 8–10 h which makes it strenuous for most of the above-mentioned techniques to identify and detect it accurately. Blacklist/whitelist and Search Engine based techniques work in real time but may fail to handle zero day phishing attacks. To tackle this problem, it is essential to have an approach that studies the dynamic behavior of the websites and predicts the new phishing website accurately. Machine Learning has been used in the past to handle dynamic behavior of phishing websites. In this paper, we propose a method in which a browser extension makes an API call to the pre-trained machine learning model to fetch the results, thus making machine learning work in real-time. Six machine learning classifiers have been rigorously trained and tested on a dataset of 5430 legitimate URLs and 5147 phished URLs. We have used a novel feature in which HTTPS URLs can be accurately identified as phished or legitimate based on Certificate validation. This method also detects the phishing websites hidden behind the short URLs along with the normal URLs, thus making it more robust. This methodology has a quick response time of 1.74 s along with an accuracy of 99.93% which is better than the previous works.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 49.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Alexa: The top 500 sites on the web (2020). https://www.alexa.com/topsites/. Accessed 12 March 2020

  2. Phishtank: Developer information (2020). https://www.phishtank.com/developer_info.php/. Accessed 12 March 2020

  3. WHOIS API: Unified and Consistent WHOIS Data (2020). https://whois.whoisxmlapi.com/. Accessed 20 March 2020

  4. APWG: Phishing attack trends reports (2020). https://apwg.org/trendsreports/. Accessed 1 May 2020

  5. Aggarwal, A., Rajadesingan, A., Kumaraguru, P.: PhishAri: automatic realtime phishing detection on Twitter. In: 2012 eCrime Researchers Summit, pp. 1–12. IEEE (2012)

    Google Scholar 

  6. Antoniades, D., et al.: we.b: the web of short urls. In: Proceedings of the 20th International Conference on World Wide Web, pp. 715–724 (2011)

    Google Scholar 

  7. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001). https://doi.org/10.1023/a:1010933404324

    Article  MATH  Google Scholar 

  8. Cooper, D., et al.: Internet X. 509 public key infrastructure certificate and certificate revocation list (CRL) profile. RFC 5280, pp. 1–151 (2008)

    Google Scholar 

  9. Freund, Y., Schapire, R.E.: A desicion-theoretic generalization of on-line learning and an application to boosting. In: Vitányi, P. (ed.) EuroCOLT 1995. LNCS, vol. 904, pp. 23–37. Springer, Heidelberg (1995). https://doi.org/10.1007/3-540-59119-2_166

    Chapter  Google Scholar 

  10. Géron, A.: Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems. O’Reilly Media (2019)

    Google Scholar 

  11. Gowtham, R., Krishnamurthi, I.: A comprehensive and efficacious architecture for detecting phishing webpages. Comput. Secur. 40, 23–37 (2014)

    Article  Google Scholar 

  12. Huh, J.H., Kim, H.: Phishing detection with popular search engines: simple and effective. In: Garcia-Alfaro, J., Lafourcade, P. (eds.) FPS 2011. LNCS, vol. 6888, pp. 194–207. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-27901-0_15

    Chapter  Google Scholar 

  13. Kamiński, B., Jakubczyk, M., Szufel, P.: A framework for sensitivity analysis of decision trees. CEJOR 26(1), 135–159 (2017). https://doi.org/10.1007/s10100-017-0479-6

    Article  MathSciNet  MATH  Google Scholar 

  14. Keller, J.M., Gray, M.R., Givens, J.A.: A fuzzy k-nearest neighbor algorithm. IEEE Trans. Syst. Man Cybern. 4, 580–585 (1985)

    Article  Google Scholar 

  15. Mason, L., Baxter, J., Bartlett, P.L., Frean, M.R.: Boosting algorithms as gradient descent. In: Advances in Neural Information Processing Systems, pp. 512–518 (2000)

    Google Scholar 

  16. Miyamoto, D., Hazeyama, H., Kadobayashi, Y.: An evaluation of machine learning-based methods for detection of phishing sites. In: Köppen, M., Kasabov, N., Coghill, G. (eds.) ICONIP 2008. LNCS, vol. 5506, pp. 539–546. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-02490-0_66

    Chapter  Google Scholar 

  17. Moghimi, M., Varjani, A.Y.: New rule-based phishing detection method. Expert Syst. Appl. 53, 231–242 (2016)

    Article  Google Scholar 

  18. Mohammad, R.M., Thabtah, F., McCluskey, L.: Predicting phishing websites based on self-structuring neural network. Neural Comput. Appl. 25(2), 443–458 (2013). https://doi.org/10.1007/s00521-013-1490-z

    Article  Google Scholar 

  19. Pan, Y., Ding, X.: Anomaly based web phishing page detection. In: 2006 22nd Annual Computer Security Applications Conference (ACSAC 2006), pp. 381–392. IEEE (2006)

    Google Scholar 

  20. Rao, R.S., Pais, A.R.: Detection of phishing websites using an efficient feature-based machine learning framework. Neural Comput. Appl. 31(8), 3851–3873 (2018). https://doi.org/10.1007/s00521-017-3305-0

    Article  Google Scholar 

  21. Varshney, G., Misra, M., Atrey, P.K.: Improving the accuracy of search engine based anti-phishing solutions using lightweight features. In: 2016 11th International Conference for Internet Technology and Secured Transactions (ICITST), pp. 365–370. IEEE (2016)

    Google Scholar 

  22. Varshney, G., Misra, M., Atrey, P.K.: A phish detector using lightweight search features. Comput. Secur. 62, 213–228 (2016)

    Article  Google Scholar 

  23. Xiang, G., Hong, J., Rose, C.P., Cranor, L.: Cantina+ a feature-rich machine learning framework for detecting phishing web sites. ACM Trans. Inf. Syst. Secur. (TISSEC) 14(2), 1–28 (2011)

    Article  Google Scholar 

  24. Zhang, Y., Hong, J.I., Cranor, L.F.: Cantina: a content-based approach to detecting phishing web sites. In: Proceedings of the 16th International Conference on World Wide Web, pp. 639–648 (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vishal Arora .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Arora, V., Misra, M. (2020). A Novel Machine Learning Methodology for Detecting Phishing Attacks in Real Time. In: Markantonakis, K., Petrocchi, M. (eds) Security and Trust Management. STM 2020. Lecture Notes in Computer Science(), vol 12386. Springer, Cham. https://doi.org/10.1007/978-3-030-59817-4_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-59817-4_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-59816-7

  • Online ISBN: 978-3-030-59817-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics