Skip to main content
Log in

Automatic detection of phishing pages with event-based request processing, deep-hybrid feature extraction and light gradient boosted machine model

  • Published:
Telecommunication Systems Aims and scope Submit manuscript

Abstract

Cyber attackers target unconscious users with phishing methods is a serious threat to cyber security. It is important to quickly detect benign web pages according to legitimate ones. Despite the successful detection of phishing in the studies suggested in the literature, the problems of high false positive rate after the web page request is processed should be resolved. The novelty of the study is that classification of deep-hybrid features with the Light Gradient Boosted Machine model is evaluated as an event when the web address is entered on the address bar of the browser. Thus, phishing can be detected at every request entry before the process is completed. In the proposed approach, normalized features from requests of web pages are applied to Sparse Autoencoder and Principal Component Analysis methods. These methods contribute to encoding of the deep-hybrid feature extraction. Light Gradient Boosted Machine model classifier can effectively distinguish legitimate pages and phishing attacks using these features. The ISCX-URL phishing dataset is used to measure performance of the proposed approach and validate it. The proposed method classifies the features that are encoded with SAE-PCA by using the Light Gradient Boosted Machine model at the rate of 99.6% within the event. The obtained results show that the proposed approach performs better classification performance metrics than most others. This accuracy contributed to the solution of the false-positives problem before requests are processed compared to other models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Demirci, S., Demirci, M., & Sagiroglu, S. (2019). Virtual security functions and their placement in software defined networks: A survey. Gazi University Journal of Science, 32(3), 833–851

    Article  Google Scholar 

  2. Basit, A., Zafar, M., Liu, X., Javed, A. R., Jalil, Z., & Kifayat, K. (2020). A comprehensive survey of AI-enabled phishing attacks detection techniques. Telecommunication Systems, 1–16.

  3. El Aassal, A., Baki, S., Das, A., & Verma, R. M. (2020). An in-depth benchmarking and evaluation of phishing detection research for security needs. IEEE Access, 8, 22170–22192

    Article  Google Scholar 

  4. Ferrag, M. A., Maglaras, L., Moschoyiannis, S., & Janicke, H. (2020). Deep learning for cyber security intrusion detection: Approaches, datasets, and comparative study. Journal of Information Security and Applications., 50, 102419

    Article  Google Scholar 

  5. Harinahalli Lokesh, G., & BoreGowda, G. (2020). Phishing website detection based on effective machine learning approach. Journal of Cyber Security Technology, 1–14.

  6. APWG Report, (2019). Phishing Activity Trends Report, Retrieved September 7, 2020, from https://docs.apwg.org/reports/apwg_trends_report_q3_2019.pdf

  7. Banu, R., Anand, M., Kamath, A., Ashika, S., Ujwala, H. S., & Harshitha, S. N. (2019). Detecting phishing attacks using natural language processing and machine learning. In 2019 International Conference on Intelligent Computing and Control Systems (ICCS) (pp. 1210–1214).

  8. Rao, R. S., Vaishnavi, T., & Pais, A. R. (2020). CatchPhish: detection of phishing websites by inspecting URLs. Journal of Ambient Intelligence and Humanized Computing, 11(2), 813–825

    Article  Google Scholar 

  9. Ali, W., & Ahmed, A. A. (2019). Hybrid intelligent phishing website prediction using deep neural networks with genetic algorithm-based feature selection and weighting. IET Information Security, 13(6), 659–669

    Article  Google Scholar 

  10. Han, W., Cao, Y., Bertino, E., & Yong, J. (2012). Using automated individual white-list to protect web digital identities. Expert Systems with Applications, 39(15), 11861–11869

    Article  Google Scholar 

  11. Jain, A. K., & Gupta, B. B. (2016). A novel approach to protect against phishing attacks at client side using auto-updated white-list. EURASIP Journal on Information Security, 2016(1), 1–11

    Article  Google Scholar 

  12. Ravi, R., & Raja, E. (2020). A performance analysis of Software Defined Network based prevention on phishing attack in cyberspace using a deep machine learning with CANTINA approach (DMLCA). Computer Communications, 153, 375–381

    Article  Google Scholar 

  13. Cao, Y., Han, W., & Le, Y. (2008, October). Anti-phishing based on automated individual white-list. In Proceedings of the 4th ACM workshop on Digital identity management (pp. 51–60).

  14. Mohammad, R. M., Thabtah, F., & McCluskey, L. (2014). Intelligent rule-based phishing websites classification. IET Information Security, 8(3), 153–160

    Article  Google Scholar 

  15. Li, T., Kou, G., & Peng, Y. (2020). Improving malicious URLs detection via feature engineering: Linear and nonlinear space transformation methods. Information Systems., 91, 101494

    Article  Google Scholar 

  16. Sahingoz, O. K., Buber, E., Demir, O., & Diri, B. (2019). Machine learning based phishing detection from URLs. Expert Systems with Applications, 117, 345–357

    Article  Google Scholar 

  17. Chiew, K. L., Tan, C. L., Wong, K., Yong, K. S., & Tiong, W. K. (2019). A new hybrid ensemble feature selection framework for machine learning-based phishing detection system. Information Sciences, 484, 153–166

    Article  Google Scholar 

  18. Xiang, G., Hong, J., Rose, C. P., & Cranor, L. (2011). Cantina+ a feature-rich machine learning framework for detecting phishing web sites. ACM Transactions on Information and System Security (TISSEC), 14(2), 1–28

    Article  Google Scholar 

  19. He, M., Horng, S. J., Fan, P., Khan, M. K., Run, R. S., Lai, J. L., & Sutanto, A. (2011). An efficient phishing webpage detector. Expert systems with applications, 38(10), 12018–12027

    Article  Google Scholar 

  20. Marchal, S., François, J., State, R., & Engel, T. (2014). PhishScore: Hacking phishers' minds. In 10th International Conference on Network and Service Management (CNSM) and Workshop, IEEE (pp. 46–54).

  21. Gowtham, R., & Krishnamurthi, I. (2014). A comprehensive and efficacious architecture for detecting phishing webpages. Computers and Security, 40, 23–37

    Article  Google Scholar 

  22. Babagoli, M., Aghababa, M. P., & Solouk, V. (2019). Heuristic nonlinear regression strategy for detecting phishing websites. Soft Computing, 23(12), 4315–4327

    Article  Google Scholar 

  23. Mohammad, R. M., Thabtah, F., & McCluskey, L. (2014). Predicting phishing websites based on self-structuring neural network. Neural Computing and Applications, 25(2), 443–458

    Article  Google Scholar 

  24. Jain, A. K., & Gupta, B. B. (2018). Towards detection of phishing websites on client-side using machine learning based approach. Telecommunication Systems, 68(4), 687–700

    Article  Google Scholar 

  25. Feng, F., Zhou, Q., Shen, Z., Yang, X., Han, L., & Wang, J. (2018). The application of a novel neural network in the detection of phishing websites. Journal of Ambient Intelligence and Humanized Computing, 1–15.

  26. Bozkir, A. S., & Aydos, M. (2020). LogoSENSE: A Companion HOG based logo detection scheme for phishing web page and e-mail brand recognition. Computers and Security, 101855.

  27. Powell, A., Bates, D., Van Wyk, C., & de Abreu, D. (2019). A cross-comparison of feature selection algorithms on multiple cyber security data-sets. In FAIR (pp. 196–207).

  28. Wei, W., Ke, Q., Nowak, J., Korytkowski, M., Scherer, R., & Woźniak, M. (2020). Accurate and fast URL phishing detector: A convolutional neural network approach. Computer Networks, 107275.

  29. Bahnsen, A. C., Bohorquez, E. C., Villegas, S., Vargas, J., & González, F. A. (2017, April). Classifying phishing URLs using recurrent neural networks. In 2017 APWG symposium on electronic crime research (eCrime) (pp. 1–8).

  30. Zhang, J., & Li, X. (2017, December). Phishing detection method based on borderline-smote deep belief network. In International Conference on Security, Privacy and Anonymity in Computation, Communication and Storage (pp. 45–53). Cham:Springer.

  31. Yang, P., Zhao, G., & Zeng, P. (2019). Phishing website detection based on multidimensional features driven by deep learning. IEEE Access, 7, 15196–15209

    Article  Google Scholar 

  32. Uçar E., İncetaş M., Mürsel O., (2019). A Deep learning approach for detection of malicious URLs. In 6th International Management Information Systems Conference, (pp.12–20).

  33. Mamun, M. S. I., Rathore, M. A., Lashkari, A. H., Stakhanova, N., & Ghorbani, A. A. (2016, September). Detecting malicious urls using lexical analysis. In International Conference on Network and System Security (pp. 467–482).

  34. Aburomman, A. A., & Reaz, M. B. I. (2016, October). Ensemble of binary SVM classifiers based on PCA and LDA feature extraction for intrusion detection. In 2016 IEEE Advanced Information Management, Communicates, Electronic and Automation Control Conference (IMCEC) (pp. 636–640).

  35. Kim, S., Jo, W., & Shon, T. (2020). APAD: autoencoder-based payload anomaly detection for industrial IoE. Applied Soft Computing., 88, 106017

    Article  Google Scholar 

  36. Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., & Liu, T. Y. (2017). Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems, 30, 3146–3154

    Google Scholar 

  37. Mamun, M. S. I., Rathore, M. A., Lashkari, A. H., Stakhanova, N., & Ghorbani, A. A (2016). ISCX-URL2016 Dataset. Retrieved May 10, 2020 from https://www.unb.ca/cic/datasets/url-2016.html

  38. Li, Y., Yang, Z., Chen, X., Yuan, H., & Liu, W. (2019). A stacking model using URL and HTML features for phishing webpage detection. Future Generation Computer Systems, 94, 27–39

    Article  Google Scholar 

Download references

Funding

This work was not funded by any organization.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ömer Kasim.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

1.1 The experimental results of compared method with TP, FP, FN and TN metrics

The TP, FP, FN and TN of compared methods.

Methods

TP

FP

TN

FN

CNN-LSTM

1380

36

1524

38

LightGBM (ISCX)

1399

17

1540

22

LightGBM (SAE)

1375

41

1515

47

LightGBM (PCA)

1387

29

1523

39

SVM (ISCX)

1363

53

1530

32

SVM (SAE)

1345

71

1507

55

SVM (PCA)

1380

38

1512

48

MLP (ISCX)

1353

63

1518

42

MLP (SAE)

1354

62

1518

44

MLP(PCA)

1369

47

1514

48

Proposed SAE and PCA with LightGBM

1407

9

1559

3

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kasim, Ö. Automatic detection of phishing pages with event-based request processing, deep-hybrid feature extraction and light gradient boosted machine model. Telecommun Syst 78, 103–115 (2021). https://doi.org/10.1007/s11235-021-00799-6

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11235-021-00799-6

Keywords

Navigation