Skip to main content
Log in

A CNN-Based SIA Screenshot Method to Visually Identify Phishing Websites

  • Published:
Journal of Network and Systems Management Aims and scope Submit manuscript

Abstract

Phishing evolves rapidly nowadays, causing much damage to finance, brand reputation, and privacy. Various phishing detection methods have been proposed along with the rise of phishing, but there are still research issues. Phishing websites mainly steal users’ information through visual deception and deep learning methods have been proved very effective in computer vision applications but there is a lack in the research on visual analysis using deep learning algorithms. Moreover, most research use balanced datasets, which is not the case in a real Web environment. Therefore, this paper proposes a security indicator area (SIA) which contains most security indicators that are designed to help users identify phishing sites. The proposed method then takes screenshots of SIA and uses a convolutional neural network (CNN) as a classifier. To prove the efficiency of the proposed method, this paper carries out several comparative experiments on an unbalanced dataset with much fewer phishing sites, which increases detection difficulty but also makes the detection closer to reality. The results show that the proposed method achieves the highest F1-score among the compared methods, while providing advantages on detection efficiency and data expansibility in phishing detection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Tweneboah-Koduah, S., Skouby, K.E., Tadayoni, R.: Cyber security threats to IoT applications and service domains. Wireless Pers. Commun. 95(1), 169–185 (2017)

    Article  Google Scholar 

  2. Ponemon.: The Cost of Phishing & Value of Employee Training. https://info.wombatsecurity.com/hubfs/Ponemon_Institute_Cost_of_Phishing.pdf?t=1467214861789

  3. NSFOCUS.: Phishing lecture hall Part2:Phishing risks (losses from attacks). http://blog.nsfocus.net/phishing-attack-risk/

  4. Nirmal, K., Janet, B., Kumar, R.: Analyzing and eliminating phishing threats in IoT, network and other Web applications using iterative intersection. Peer-to-Peer Networking and Applications, pp. 1–13 (2020)

  5. V, E.: Phishing Trends & Intelligence Report: The Growing Social Engineering Threat. https://info.phishlabs.com/2019-pti-report-evolving-threat

  6. Microsoft.: Microsoft Security Intelligence Report Volume 24. https://info.microsoft.com/%20%20ww-landing-M365-SIR-v24-Report-eBook.HTML

  7. Geng, G.G., Lee, X.D., Zhang, Y.M.: Combating phishing attacks via brand identity and authorization features. Secur. Commun. Netw. 8(6), 888–898 (2015)

    Article  Google Scholar 

  8. Chiew, K.L., Chang, E.H., Sze, S.N., Tiong, W.K.: Utilisation of website logo for phishing detection. Comput. Secur. 54, 16–26 (2015)

    Article  Google Scholar 

  9. Moghimi, M., Varjani, A.Y.: New rule-based phishing detection method. Expert Syst. Appl. 53, 231–242 (2016)

    Article  Google Scholar 

  10. Rao, R., Pais, A.: Detection of phishing websites using an efficient feature-based machine learning framework. Neural Comput. Appl. 01(31), 3851–3873 (2018)

    Google Scholar 

  11. Jain, A., Gupta, B.B.: Towards detection of phishing websites on client-side using machine learning based approach. Telecommun. Syst. 12(68), 687–700 (2017)

    Google Scholar 

  12. Sahingoz, O., Buber, E., Demir, O., Diri, B.: Machine learning based phishing detection from URLs. Expert Syst. Appl. 01(117), 345–357 (2019)

    Article  Google Scholar 

  13. Abbas, A., Singh, S., Kau, M.: Detection of Phishing Websites Using Machine Learning, pp. 1307–1314. Springer, New York (2020)

    Google Scholar 

  14. Gastellier-Prevost, S., Granadillo, G.G., Laurent, M.: Decisive Heuristics to Differentiate Legitimate from Phishing Sites. In: 2011 Conference on Network and Information Systems Security, pp. 1–9 (2011)

  15. Geng, G., Yan, Z., Zeng, Y., Jin, X.: RRPhish: Anti-phishing via mining brand resources request. In: 2018 IEEE International Conference on Consumer Electronics (ICCE), pp. 1–2 (2018)

  16. Zhang, X., Shen, C., Chen, Y., Wu, X., Liu, C.: An analysis of intelligent acousitic system. Front. Data Comput. 6, 98–109 (2019)

    Google Scholar 

  17. Kreuk, F., Adi, Y., Cisse, M., Keshet, J.: Fooling end-to-end speaker verification with adversarial examples. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1962–1966 (2018)

  18. Serdyuk, D., Audhkhasi, K., Brakel, P., Ramabhadran, B., Thomas, S., Bengio, Y.: Invariant Representations for Noisy Speech Recognition. In: 30th Conference on Neural Information Processing Systems (NIPS 2016) (2016)

  19. Jiang, F., Fu, Y., Gupta, B.B., Liang, Y., Rho, S., Lou, F., et al.: Deep learning based multi-channel intelligent attack detection for data security. IEEE Trans. Sustain. Comput. 5(2), 204–212 (2020)

    Article  Google Scholar 

  20. Buczak, A., Guven, E.: A survey of data mining and machine learning methods for cyber security intrusion detection. IEEE Commun. Surv. Tutor. 18(2), 1153–1176 (2017)

    Article  Google Scholar 

  21. Subasi, A., Molah, E., Almkallawi, F., Chaudhery, T.: Intelligent phishing website detection using random forest classifier. In: 2017 International Conference on Electrical and Computing Technologies and Applications (ICECTA), pp. 1–5 (2017)

  22. Parekh, S., Parikh, D., Kotak, S., Sankhe, P.: A New Method for Detection of Phishing Websites: URL Detection. In: 2018 Second International Conference on Inventive Communication and Computational Technologies (ICICCT), pp. 949–952 (2018)

  23. Babagoli, M., Aghababa, M., Solouk, V.: Heuristic nonlinear regression strategy for detecting phishing websites. Soft. Comput. 02(23), 4315–4327 (2018)

    Google Scholar 

  24. Rodríguez, J., García, V., Castillo, N.P.: Webpages Classification with Phishing Content Using Naive Bayes Algorithm, pp. 249–258. Springer, New York (2019)

  25. Wei, B., Hamad, R., Yang, L., He, X., Wang, H., Gao, B., et al.: A deep-learning-driven light-weight phishing detection sensor. Sensors 09(19), 4258 (2019)

    Article  Google Scholar 

  26. Chen, W., Zhang, W., Su, Y.: Phishing Detection Research Based on LSTM Recurrent Neural Network. In: International Conference of Pioneering Computer Scientists, Engineers and Educators (ICPCSEE 2018) (2018)

  27. Hiransha, M., Unnithan, N.A., Vinayakumar, R., Soman, K., Verma, A.: Deep learning based phishing e-mail detection. In: Proc. 1st AntiPhishing Shared Pilot 4th ACM Int. Workshop Secur. Privacy Anal.(IWSPA) (2018)

  28. Cuzzocrea, A., Martinelli, F., Mercaldo, F.: A machine-learning framework for supporting intelligent web-phishing detection and analysis. In: IDEAS ’19: Proceedings of the 23rd International Database Applications & Engineering Symposium (2019)

  29. Alex, K., Ilya, S., Hg, E.: Imagenet classification with deep convolutional neural networks. In: Proceedings of NIPS, IEEE, Neural Information Processing System Foundation. 01(25), 1097–1105 (2012)

  30. Liang, Y., Deng, J., Cui, B.: Bidirectional LSTM: An Innovative Approach for Phishing URL Identification, pp. 326–337 (2020)

  31. Tajaddodianfar, F., Stokes, J., Gururajan, A.: Texception: A Character/Word-Level Deep Learning Model for Phishing URL Detection. In: ICASSP 2020—2020 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 2857–2861 (2020)

  32. PhishLabs.: PhishLabs 2017 Phishing Trends & Intelligence Report. https://www.phishlabs.com/phishlabs-2017-phishing-trends-intelligence-report-hacking-the-human/

  33. W3Techs.: Usage statistics of Default protocol https for websites. https://w3techs.com/technologies/details/ce-httpsdefault.2020

  34. Liu, D., Lee, J.H.: CNN based malicious website detection by invalidating multiple web spams. IEEE Access 05(8), 97258–97266 (2020)

    Article  Google Scholar 

  35. Bisong E. In: TensorFlow 2.0 and Keras. Apress; 2019. p. 347–399

  36. Manaswi, K.N.: Understanding and Working with Keras. Apress, pp. 31–43 (2018)

  37. Liu, D., Lee, J.: CNN based malicious website detection by invalidating multiple web spams. IEEE Access 8, 97258–97266 (2020)

    Article  Google Scholar 

  38. Aljofey, J., Jiang, Q., Rasool, A., Chen, H., Liu, W., Qu, Q., Wang, Y.: An effective detection approach for phishing websites using URL and HTML features. Sci. Rep. 12(1), 8842 (2022)

    Article  Google Scholar 

  39. Lokesh, G.H., BoreGowda, G.: Phishing website detection based on effective machine learning approach. J. Cyber Secur. Technol. 5, 1–14 (2021)

    Article  Google Scholar 

  40. Alshehri, M., Abugabah, A., Algarni, M., Almotairi, S.: Character-level word encoding deep learning model for combating cyber threats in phishing URL detection. Comput. Electr. Eng. 100, 107868 (2022)

    Article  Google Scholar 

  41. Dilhara, S., Phishing, U.R.L.: Detection: a novel hybrid approach using long short-term memory and gated recurrent units. Int. J. Comput. Appl. 183, 41–54 (2021)

    Google Scholar 

  42. Zhang, Q., Bu, Y., Chen, B., Zhang, S., Lu, X.: Research on phishing webpage detection technology based on cnn-bilstm algorithm. J. Phys. 1738, 012131 (2021)

    Google Scholar 

  43. Al-Ahmadi, S., Lasloum, T.: PDMLP: phishing detection using multilayer perceptron. Int. J. Netw. Secur. Appl. 12, 59–72 (2020)

    Google Scholar 

  44. Xu, P.: A Transformer-based Model to Detect Phishing URLs. J. Phys. Conf. Ser. (2021). arXiv preprint arXiv:2109.02138

Download references

Funding

This work was supported by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. 2021-0-00796, Research on Foundational Technologies for 6G Autonomous Security-by-Design to Guarantee Constant Quality of Security).

Author information

Authors and Affiliations

Authors

Contributions

Liu analyzed visual counterfeiting of phishing websites, and proposed the Security Indicator Area (SIA) as an input, which utilizes visual analysis and makes the input interpretable. Liu and Lee carried out several comparative experiments on a constructed unbalanced dataset. Liu and Lee reviewed the manuscript. Lee is the corresponding author of this paper.

Corresponding author

Correspondence to Jong-Hyouk Lee.

Ethics declarations

Conflict of interest

The authors have no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, DJ., Lee, JH. A CNN-Based SIA Screenshot Method to Visually Identify Phishing Websites. J Netw Syst Manage 32, 8 (2024). https://doi.org/10.1007/s10922-023-09784-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10922-023-09784-7

Keywords

Navigation