Automatic detection of phishing pages with event-based request processing, deep-hybrid feature extraction and light gradient boosted machine model

Kasim, Ömer

doi:10.1007/s11235-021-00799-6

Automatic detection of phishing pages with event-based request processing, deep-hybrid feature extraction and light gradient boosted machine model

Published: 19 May 2021

Volume 78, pages 103–115, (2021)
Cite this article

Telecommunication Systems Aims and scope Submit manuscript

Ömer Kasim ORCID: orcid.org/0000-0003-4021-5412¹

470 Accesses
2 Citations
Explore all metrics

Abstract

Cyber attackers target unconscious users with phishing methods is a serious threat to cyber security. It is important to quickly detect benign web pages according to legitimate ones. Despite the successful detection of phishing in the studies suggested in the literature, the problems of high false positive rate after the web page request is processed should be resolved. The novelty of the study is that classification of deep-hybrid features with the Light Gradient Boosted Machine model is evaluated as an event when the web address is entered on the address bar of the browser. Thus, phishing can be detected at every request entry before the process is completed. In the proposed approach, normalized features from requests of web pages are applied to Sparse Autoencoder and Principal Component Analysis methods. These methods contribute to encoding of the deep-hybrid feature extraction. Light Gradient Boosted Machine model classifier can effectively distinguish legitimate pages and phishing attacks using these features. The ISCX-URL phishing dataset is used to measure performance of the proposed approach and validate it. The proposed method classifies the features that are encoded with SAE-PCA by using the Light Gradient Boosted Machine model at the rate of 99.6% within the event. The obtained results show that the proposed approach performs better classification performance metrics than most others. This accuracy contributed to the solution of the false-positives problem before requests are processed compared to other models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

Multidimensional Features Driven Phishing Detection Based on Deep Learning

An effective detection approach for phishing websites using URL and HTML features

Article Open access 25 May 2022

MDepthNet based phishing attack detection using integrated deep learning methodologies for cyber security enhancement

Article 29 February 2024

References

Demirci, S., Demirci, M., & Sagiroglu, S. (2019). Virtual security functions and their placement in software defined networks: A survey. Gazi University Journal of Science, 32(3), 833–851
Article Google Scholar
Basit, A., Zafar, M., Liu, X., Javed, A. R., Jalil, Z., & Kifayat, K. (2020). A comprehensive survey of AI-enabled phishing attacks detection techniques. Telecommunication Systems, 1–16.
El Aassal, A., Baki, S., Das, A., & Verma, R. M. (2020). An in-depth benchmarking and evaluation of phishing detection research for security needs. IEEE Access, 8, 22170–22192
Article Google Scholar
Ferrag, M. A., Maglaras, L., Moschoyiannis, S., & Janicke, H. (2020). Deep learning for cyber security intrusion detection: Approaches, datasets, and comparative study. Journal of Information Security and Applications., 50, 102419
Article Google Scholar
Harinahalli Lokesh, G., & BoreGowda, G. (2020). Phishing website detection based on effective machine learning approach. Journal of Cyber Security Technology, 1–14.
APWG Report, (2019). Phishing Activity Trends Report, Retrieved September 7, 2020, from https://docs.apwg.org/reports/apwg_trends_report_q3_2019.pdf
Banu, R., Anand, M., Kamath, A., Ashika, S., Ujwala, H. S., & Harshitha, S. N. (2019). Detecting phishing attacks using natural language processing and machine learning. In 2019 International Conference on Intelligent Computing and Control Systems (ICCS) (pp. 1210–1214).
Rao, R. S., Vaishnavi, T., & Pais, A. R. (2020). CatchPhish: detection of phishing websites by inspecting URLs. Journal of Ambient Intelligence and Humanized Computing, 11(2), 813–825
Article Google Scholar
Ali, W., & Ahmed, A. A. (2019). Hybrid intelligent phishing website prediction using deep neural networks with genetic algorithm-based feature selection and weighting. IET Information Security, 13(6), 659–669
Article Google Scholar
Han, W., Cao, Y., Bertino, E., & Yong, J. (2012). Using automated individual white-list to protect web digital identities. Expert Systems with Applications, 39(15), 11861–11869
Article Google Scholar
Jain, A. K., & Gupta, B. B. (2016). A novel approach to protect against phishing attacks at client side using auto-updated white-list. EURASIP Journal on Information Security, 2016(1), 1–11
Article Google Scholar
Ravi, R., & Raja, E. (2020). A performance analysis of Software Defined Network based prevention on phishing attack in cyberspace using a deep machine learning with CANTINA approach (DMLCA). Computer Communications, 153, 375–381
Article Google Scholar
Cao, Y., Han, W., & Le, Y. (2008, October). Anti-phishing based on automated individual white-list. In Proceedings of the 4th ACM workshop on Digital identity management (pp. 51–60).
Mohammad, R. M., Thabtah, F., & McCluskey, L. (2014). Intelligent rule-based phishing websites classification. IET Information Security, 8(3), 153–160
Article Google Scholar
Li, T., Kou, G., & Peng, Y. (2020). Improving malicious URLs detection via feature engineering: Linear and nonlinear space transformation methods. Information Systems., 91, 101494
Article Google Scholar
Sahingoz, O. K., Buber, E., Demir, O., & Diri, B. (2019). Machine learning based phishing detection from URLs. Expert Systems with Applications, 117, 345–357
Article Google Scholar
Chiew, K. L., Tan, C. L., Wong, K., Yong, K. S., & Tiong, W. K. (2019). A new hybrid ensemble feature selection framework for machine learning-based phishing detection system. Information Sciences, 484, 153–166
Article Google Scholar
Xiang, G., Hong, J., Rose, C. P., & Cranor, L. (2011). Cantina+ a feature-rich machine learning framework for detecting phishing web sites. ACM Transactions on Information and System Security (TISSEC), 14(2), 1–28
Article Google Scholar
He, M., Horng, S. J., Fan, P., Khan, M. K., Run, R. S., Lai, J. L., & Sutanto, A. (2011). An efficient phishing webpage detector. Expert systems with applications, 38(10), 12018–12027
Article Google Scholar
Marchal, S., François, J., State, R., & Engel, T. (2014). PhishScore: Hacking phishers' minds. In 10th International Conference on Network and Service Management (CNSM) and Workshop, IEEE (pp. 46–54).
Gowtham, R., & Krishnamurthi, I. (2014). A comprehensive and efficacious architecture for detecting phishing webpages. Computers and Security, 40, 23–37
Article Google Scholar
Babagoli, M., Aghababa, M. P., & Solouk, V. (2019). Heuristic nonlinear regression strategy for detecting phishing websites. Soft Computing, 23(12), 4315–4327
Article Google Scholar
Mohammad, R. M., Thabtah, F., & McCluskey, L. (2014). Predicting phishing websites based on self-structuring neural network. Neural Computing and Applications, 25(2), 443–458
Article Google Scholar
Jain, A. K., & Gupta, B. B. (2018). Towards detection of phishing websites on client-side using machine learning based approach. Telecommunication Systems, 68(4), 687–700
Article Google Scholar
Feng, F., Zhou, Q., Shen, Z., Yang, X., Han, L., & Wang, J. (2018). The application of a novel neural network in the detection of phishing websites. Journal of Ambient Intelligence and Humanized Computing, 1–15.
Bozkir, A. S., & Aydos, M. (2020). LogoSENSE: A Companion HOG based logo detection scheme for phishing web page and e-mail brand recognition. Computers and Security, 101855.
Powell, A., Bates, D., Van Wyk, C., & de Abreu, D. (2019). A cross-comparison of feature selection algorithms on multiple cyber security data-sets. In FAIR (pp. 196–207).
Wei, W., Ke, Q., Nowak, J., Korytkowski, M., Scherer, R., & Woźniak, M. (2020). Accurate and fast URL phishing detector: A convolutional neural network approach. Computer Networks, 107275.
Bahnsen, A. C., Bohorquez, E. C., Villegas, S., Vargas, J., & González, F. A. (2017, April). Classifying phishing URLs using recurrent neural networks. In 2017 APWG symposium on electronic crime research (eCrime) (pp. 1–8).
Zhang, J., & Li, X. (2017, December). Phishing detection method based on borderline-smote deep belief network. In International Conference on Security, Privacy and Anonymity in Computation, Communication and Storage (pp. 45–53). Cham:Springer.
Yang, P., Zhao, G., & Zeng, P. (2019). Phishing website detection based on multidimensional features driven by deep learning. IEEE Access, 7, 15196–15209
Article Google Scholar
Uçar E., İncetaş M., Mürsel O., (2019). A Deep learning approach for detection of malicious URLs. In 6th International Management Information Systems Conference, (pp.12–20).
Mamun, M. S. I., Rathore, M. A., Lashkari, A. H., Stakhanova, N., & Ghorbani, A. A. (2016, September). Detecting malicious urls using lexical analysis. In International Conference on Network and System Security (pp. 467–482).
Aburomman, A. A., & Reaz, M. B. I. (2016, October). Ensemble of binary SVM classifiers based on PCA and LDA feature extraction for intrusion detection. In 2016 IEEE Advanced Information Management, Communicates, Electronic and Automation Control Conference (IMCEC) (pp. 636–640).
Kim, S., Jo, W., & Shon, T. (2020). APAD: autoencoder-based payload anomaly detection for industrial IoE. Applied Soft Computing., 88, 106017
Article Google Scholar
Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., & Liu, T. Y. (2017). Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems, 30, 3146–3154
Google Scholar
Mamun, M. S. I., Rathore, M. A., Lashkari, A. H., Stakhanova, N., & Ghorbani, A. A (2016). ISCX-URL2016 Dataset. Retrieved May 10, 2020 from https://www.unb.ca/cic/datasets/url-2016.html
Li, Y., Yang, Z., Chen, X., Yuan, H., & Liu, W. (2019). A stacking model using URL and HTML features for phishing webpage detection. Future Generation Computer Systems, 94, 27–39
Article Google Scholar

Download references

Funding

This work was not funded by any organization.

Author information

Authors and Affiliations

Department of Electrical and Electronics Engineering, Simav Technology Faculty, Kutahya Dumlupinar University, Kutahya, 43500, Turkey
Ömer Kasim

Authors

Ömer Kasim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ömer Kasim.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

1.1 The experimental results of compared method with TP, FP, FN and TN metrics

The TP, FP, FN and TN of compared methods.

Methods	TP	FP	TN	FN
CNN-LSTM	1380	36	1524	38
LightGBM (ISCX)	1399	17	1540	22
LightGBM (SAE)	1375	41	1515	47
LightGBM (PCA)	1387	29	1523	39
SVM (ISCX)	1363	53	1530	32
SVM (SAE)	1345	71	1507	55
SVM (PCA)	1380	38	1512	48
MLP (ISCX)	1353	63	1518	42
MLP (SAE)	1354	62	1518	44
MLP(PCA)	1369	47	1514	48
Proposed SAE and PCA with LightGBM	1407	9	1559	3

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kasim, Ö. Automatic detection of phishing pages with event-based request processing, deep-hybrid feature extraction and light gradient boosted machine model. Telecommun Syst 78, 103–115 (2021). https://doi.org/10.1007/s11235-021-00799-6

Download citation

Accepted: 27 April 2021
Published: 19 May 2021
Issue Date: September 2021
DOI: https://doi.org/10.1007/s11235-021-00799-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic detection of phishing pages with event-based request processing, deep-hybrid feature extraction and light gradient boosted machine model

Abstract

Access this article

Similar content being viewed by others

Multidimensional Features Driven Phishing Detection Based on Deep Learning

An effective detection approach for phishing websites using URL and HTML features

MDepthNet based phishing attack detection using integrated deep learning methodologies for cyber security enhancement

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Appendix

1.1 The experimental results of compared method with TP, FP, FN and TN metrics

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Automatic detection of phishing pages with event-based request processing, deep-hybrid feature extraction and light gradient boosted machine model

Abstract

Access this article

Similar content being viewed by others

Multidimensional Features Driven Phishing Detection Based on Deep Learning

An effective detection approach for phishing websites using URL and HTML features

MDepthNet based phishing attack detection using integrated deep learning methodologies for cyber security enhancement

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Appendix

Appendix

1.1 The experimental results of compared method with TP, FP, FN and TN metrics

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation