Abstract
Phishing is a cybercriminal activity where the criminal masquerades as a trusted entity and attacks the righteous users to gain personal information illegally. Many phishing detection techniques have been proposed in the past which use blacklist/whitelist, heuristic, search engine, visual similarity and machine learning. The statistics say that the average lifespan of any phishing website is 8–10 h which makes it strenuous for most of the above-mentioned techniques to identify and detect it accurately. Blacklist/whitelist and Search Engine based techniques work in real time but may fail to handle zero day phishing attacks. To tackle this problem, it is essential to have an approach that studies the dynamic behavior of the websites and predicts the new phishing website accurately. Machine Learning has been used in the past to handle dynamic behavior of phishing websites. In this paper, we propose a method in which a browser extension makes an API call to the pre-trained machine learning model to fetch the results, thus making machine learning work in real-time. Six machine learning classifiers have been rigorously trained and tested on a dataset of 5430 legitimate URLs and 5147 phished URLs. We have used a novel feature in which HTTPS URLs can be accurately identified as phished or legitimate based on Certificate validation. This method also detects the phishing websites hidden behind the short URLs along with the normal URLs, thus making it more robust. This methodology has a quick response time of 1.74 s along with an accuracy of 99.93% which is better than the previous works.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alexa: The top 500 sites on the web (2020). https://www.alexa.com/topsites/. Accessed 12 March 2020
Phishtank: Developer information (2020). https://www.phishtank.com/developer_info.php/. Accessed 12 March 2020
WHOIS API: Unified and Consistent WHOIS Data (2020). https://whois.whoisxmlapi.com/. Accessed 20 March 2020
APWG: Phishing attack trends reports (2020). https://apwg.org/trendsreports/. Accessed 1 May 2020
Aggarwal, A., Rajadesingan, A., Kumaraguru, P.: PhishAri: automatic realtime phishing detection on Twitter. In: 2012 eCrime Researchers Summit, pp. 1–12. IEEE (2012)
Antoniades, D., et al.: we.b: the web of short urls. In: Proceedings of the 20th International Conference on World Wide Web, pp. 715–724 (2011)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001). https://doi.org/10.1023/a:1010933404324
Cooper, D., et al.: Internet X. 509 public key infrastructure certificate and certificate revocation list (CRL) profile. RFC 5280, pp. 1–151 (2008)
Freund, Y., Schapire, R.E.: A desicion-theoretic generalization of on-line learning and an application to boosting. In: Vitányi, P. (ed.) EuroCOLT 1995. LNCS, vol. 904, pp. 23–37. Springer, Heidelberg (1995). https://doi.org/10.1007/3-540-59119-2_166
Géron, A.: Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems. O’Reilly Media (2019)
Gowtham, R., Krishnamurthi, I.: A comprehensive and efficacious architecture for detecting phishing webpages. Comput. Secur. 40, 23–37 (2014)
Huh, J.H., Kim, H.: Phishing detection with popular search engines: simple and effective. In: Garcia-Alfaro, J., Lafourcade, P. (eds.) FPS 2011. LNCS, vol. 6888, pp. 194–207. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-27901-0_15
Kamiński, B., Jakubczyk, M., Szufel, P.: A framework for sensitivity analysis of decision trees. CEJOR 26(1), 135–159 (2017). https://doi.org/10.1007/s10100-017-0479-6
Keller, J.M., Gray, M.R., Givens, J.A.: A fuzzy k-nearest neighbor algorithm. IEEE Trans. Syst. Man Cybern. 4, 580–585 (1985)
Mason, L., Baxter, J., Bartlett, P.L., Frean, M.R.: Boosting algorithms as gradient descent. In: Advances in Neural Information Processing Systems, pp. 512–518 (2000)
Miyamoto, D., Hazeyama, H., Kadobayashi, Y.: An evaluation of machine learning-based methods for detection of phishing sites. In: Köppen, M., Kasabov, N., Coghill, G. (eds.) ICONIP 2008. LNCS, vol. 5506, pp. 539–546. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-02490-0_66
Moghimi, M., Varjani, A.Y.: New rule-based phishing detection method. Expert Syst. Appl. 53, 231–242 (2016)
Mohammad, R.M., Thabtah, F., McCluskey, L.: Predicting phishing websites based on self-structuring neural network. Neural Comput. Appl. 25(2), 443–458 (2013). https://doi.org/10.1007/s00521-013-1490-z
Pan, Y., Ding, X.: Anomaly based web phishing page detection. In: 2006 22nd Annual Computer Security Applications Conference (ACSAC 2006), pp. 381–392. IEEE (2006)
Rao, R.S., Pais, A.R.: Detection of phishing websites using an efficient feature-based machine learning framework. Neural Comput. Appl. 31(8), 3851–3873 (2018). https://doi.org/10.1007/s00521-017-3305-0
Varshney, G., Misra, M., Atrey, P.K.: Improving the accuracy of search engine based anti-phishing solutions using lightweight features. In: 2016 11th International Conference for Internet Technology and Secured Transactions (ICITST), pp. 365–370. IEEE (2016)
Varshney, G., Misra, M., Atrey, P.K.: A phish detector using lightweight search features. Comput. Secur. 62, 213–228 (2016)
Xiang, G., Hong, J., Rose, C.P., Cranor, L.: Cantina+ a feature-rich machine learning framework for detecting phishing web sites. ACM Trans. Inf. Syst. Secur. (TISSEC) 14(2), 1–28 (2011)
Zhang, Y., Hong, J.I., Cranor, L.F.: Cantina: a content-based approach to detecting phishing web sites. In: Proceedings of the 16th International Conference on World Wide Web, pp. 639–648 (2007)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Arora, V., Misra, M. (2020). A Novel Machine Learning Methodology for Detecting Phishing Attacks in Real Time. In: Markantonakis, K., Petrocchi, M. (eds) Security and Trust Management. STM 2020. Lecture Notes in Computer Science(), vol 12386. Springer, Cham. https://doi.org/10.1007/978-3-030-59817-4_3
Download citation
DOI: https://doi.org/10.1007/978-3-030-59817-4_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59816-7
Online ISBN: 978-3-030-59817-4
eBook Packages: Computer ScienceComputer Science (R0)