A Novel Machine Learning Methodology for Detecting Phishing Attacks in Real Time

Arora, Vishal; Misra, Manoj

doi:10.1007/978-3-030-59817-4_3

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 12386))

Included in the following conference series:

International Workshop on Security and Trust Management

467 Accesses

Abstract

Phishing is a cybercriminal activity where the criminal masquerades as a trusted entity and attacks the righteous users to gain personal information illegally. Many phishing detection techniques have been proposed in the past which use blacklist/whitelist, heuristic, search engine, visual similarity and machine learning. The statistics say that the average lifespan of any phishing website is 8–10 h which makes it strenuous for most of the above-mentioned techniques to identify and detect it accurately. Blacklist/whitelist and Search Engine based techniques work in real time but may fail to handle zero day phishing attacks. To tackle this problem, it is essential to have an approach that studies the dynamic behavior of the websites and predicts the new phishing website accurately. Machine Learning has been used in the past to handle dynamic behavior of phishing websites. In this paper, we propose a method in which a browser extension makes an API call to the pre-trained machine learning model to fetch the results, thus making machine learning work in real-time. Six machine learning classifiers have been rigorously trained and tested on a dataset of 5430 legitimate URLs and 5147 phished URLs. We have used a novel feature in which HTTPS URLs can be accurately identified as phished or legitimate based on Certificate validation. This method also detects the phishing websites hidden behind the short URLs along with the normal URLs, thus making it more robust. This methodology has a quick response time of 1.74 s along with an accuracy of 99.93% which is better than the previous works.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 49.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Alexa: The top 500 sites on the web (2020). https://www.alexa.com/topsites/. Accessed 12 March 2020
Phishtank: Developer information (2020). https://www.phishtank.com/developer_info.php/. Accessed 12 March 2020
WHOIS API: Unified and Consistent WHOIS Data (2020). https://whois.whoisxmlapi.com/. Accessed 20 March 2020
APWG: Phishing attack trends reports (2020). https://apwg.org/trendsreports/. Accessed 1 May 2020
Aggarwal, A., Rajadesingan, A., Kumaraguru, P.: PhishAri: automatic realtime phishing detection on Twitter. In: 2012 eCrime Researchers Summit, pp. 1–12. IEEE (2012)
Google Scholar
Antoniades, D., et al.: we.b: the web of short urls. In: Proceedings of the 20th International Conference on World Wide Web, pp. 715–724 (2011)
Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001). https://doi.org/10.1023/a:1010933404324
Article MATH Google Scholar
Cooper, D., et al.: Internet X. 509 public key infrastructure certificate and certificate revocation list (CRL) profile. RFC 5280, pp. 1–151 (2008)
Google Scholar
Freund, Y., Schapire, R.E.: A desicion-theoretic generalization of on-line learning and an application to boosting. In: Vitányi, P. (ed.) EuroCOLT 1995. LNCS, vol. 904, pp. 23–37. Springer, Heidelberg (1995). https://doi.org/10.1007/3-540-59119-2_166
Chapter Google Scholar
Géron, A.: Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems. O’Reilly Media (2019)
Google Scholar
Gowtham, R., Krishnamurthi, I.: A comprehensive and efficacious architecture for detecting phishing webpages. Comput. Secur. 40, 23–37 (2014)
Article Google Scholar
Huh, J.H., Kim, H.: Phishing detection with popular search engines: simple and effective. In: Garcia-Alfaro, J., Lafourcade, P. (eds.) FPS 2011. LNCS, vol. 6888, pp. 194–207. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-27901-0_15
Chapter Google Scholar
Kamiński, B., Jakubczyk, M., Szufel, P.: A framework for sensitivity analysis of decision trees. CEJOR 26(1), 135–159 (2017). https://doi.org/10.1007/s10100-017-0479-6
Article MathSciNet MATH Google Scholar
Keller, J.M., Gray, M.R., Givens, J.A.: A fuzzy k-nearest neighbor algorithm. IEEE Trans. Syst. Man Cybern. 4, 580–585 (1985)
Article Google Scholar
Mason, L., Baxter, J., Bartlett, P.L., Frean, M.R.: Boosting algorithms as gradient descent. In: Advances in Neural Information Processing Systems, pp. 512–518 (2000)
Google Scholar
Miyamoto, D., Hazeyama, H., Kadobayashi, Y.: An evaluation of machine learning-based methods for detection of phishing sites. In: Köppen, M., Kasabov, N., Coghill, G. (eds.) ICONIP 2008. LNCS, vol. 5506, pp. 539–546. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-02490-0_66
Chapter Google Scholar
Moghimi, M., Varjani, A.Y.: New rule-based phishing detection method. Expert Syst. Appl. 53, 231–242 (2016)
Article Google Scholar
Mohammad, R.M., Thabtah, F., McCluskey, L.: Predicting phishing websites based on self-structuring neural network. Neural Comput. Appl. 25(2), 443–458 (2013). https://doi.org/10.1007/s00521-013-1490-z
Article Google Scholar
Pan, Y., Ding, X.: Anomaly based web phishing page detection. In: 2006 22nd Annual Computer Security Applications Conference (ACSAC 2006), pp. 381–392. IEEE (2006)
Google Scholar
Rao, R.S., Pais, A.R.: Detection of phishing websites using an efficient feature-based machine learning framework. Neural Comput. Appl. 31(8), 3851–3873 (2018). https://doi.org/10.1007/s00521-017-3305-0
Article Google Scholar
Varshney, G., Misra, M., Atrey, P.K.: Improving the accuracy of search engine based anti-phishing solutions using lightweight features. In: 2016 11th International Conference for Internet Technology and Secured Transactions (ICITST), pp. 365–370. IEEE (2016)
Google Scholar
Varshney, G., Misra, M., Atrey, P.K.: A phish detector using lightweight search features. Comput. Secur. 62, 213–228 (2016)
Article Google Scholar
Xiang, G., Hong, J., Rose, C.P., Cranor, L.: Cantina+ a feature-rich machine learning framework for detecting phishing web sites. ACM Trans. Inf. Syst. Secur. (TISSEC) 14(2), 1–28 (2011)
Article Google Scholar
Zhang, Y., Hong, J.I., Cranor, L.F.: Cantina: a content-based approach to detecting phishing web sites. In: Proceedings of the 16th International Conference on World Wide Web, pp. 639–648 (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

Indian Institute of Technology Roorkee, Roorkee, 247667, India
Vishal Arora & Manoj Misra

Authors

Vishal Arora
View author publications
You can also search for this author in PubMed Google Scholar
Manoj Misra
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vishal Arora .

Editor information

Editors and Affiliations

Royal Holloway, University of London, Egham, UK
Kostantinos Markantonakis
National Research Council, Pisa, Italy
Marinella Petrocchi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Arora, V., Misra, M. (2020). A Novel Machine Learning Methodology for Detecting Phishing Attacks in Real Time. In: Markantonakis, K., Petrocchi, M. (eds) Security and Trust Management. STM 2020. Lecture Notes in Computer Science(), vol 12386. Springer, Cham. https://doi.org/10.1007/978-3-030-59817-4_3

Download citation

DOI: https://doi.org/10.1007/978-3-030-59817-4_3
Published: 16 September 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59816-7
Online ISBN: 978-3-030-59817-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics