Cascade Generalization Based Functional Tree for Website Phishing Detection

Balogun, Abdullateef O.; Adewole, Kayode S.; Bajeh, Amos O.; Jimoh, Rasheed G.

doi:10.1007/978-981-16-8059-5_17

Abdullateef O. Balogun^8,9,
Kayode S. Adewole⁸,
Amos O. Bajeh⁸ &
…
Rasheed G. Jimoh⁸

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1487))

Included in the following conference series:

International Conference on Advances in Cyber Security

1818 Accesses

Abstract

The advent of the web and internet space has seen its adoption for rendering various services -from financial to medical services. This has brought an increase in the rate of cybersecurity issues over the years and a prominent one is the phishing attack where malicious websites mimic the appearance and functionalities of another legitimate website to collect users’ credentials required for access to services. Several measures have been proposed to mitigate this attack; blacklisting and variants of machine learning approaches have been employed, yielding good performance results. However, there is a need to increase the rate of identification of phishing attacks and reduce the rate of false positives. This study proposes the use of a functional tree (FT) machine learning approach to mitigate phishing attacks. FT, a hybridization of multivariate decision trees and discriminant function using constructive induction, uses logistic regression for splitting tree nodes and leaf prediction, unlike the conventional decision tree that simply split nodes based on the data. Furthermore, a variant of the FT is proposed based on cascade generalization (CG-FT). Three datasets with varied instance distributions, both balanced and imbalanced, are used in the empirical investigation of the performance of the proposed CG-FT. The results showed that FT has improved performances over some selected baseline classifiers. Relative to FT, the CG-FT techniques showed improvement in the detection of a phishing attack with Area Under the Curve (AUC) and True Positive rate (TP-rate) ranging from 98–99.6% and 92–97% respectively in the datasets. Also, the false-positive rate is reduced with values ranging from 1.7 to 6.1%. The proposed CG-FT showed improvement over all the other reviewed approaches based on studied performance metrics. The use of FT and its hybridization with cascade generalization (CG-FT) showed an improvement in performance in the mitigation of phishing attacks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Determining the Most Effective Machine Learning Techniques for Detecting Phishing Websites

Ensemble-Based Logistic Model Trees for Website Phishing Detection

Website Phishing Detection Using Machine Learning Classification Algorithms

References

Mohammad, R.M., Thabtah, F., McCluskey, L.: Predicting phishing websites based on self-structuring neural network. Neural Comput. Appl. 25(2), 443–458 (2013). https://doi.org/10.1007/s00521-013-1490-z
Article Google Scholar
Vrbančič, G., Fister, I., Jr., Podgorelec, V.: Swarm intelligence approaches for parameter setting of deep learning neural network: case study on phishing websites classification. In: Proceedings of the 8th International Conference on Web Intelligence, Mining and Semantics, pp. 1–8 (2018)
Google Scholar
Adeyemo, V.E., Azween, A., JhanJhi, N., Mahadevan, S., Balogun, A.O.: Ensemble and deep-learning methods for two-class and multi-attack anomaly intrusion detection: an empirical study. Int. J. Adv. Comput. Sci. Appl. 10, 520–528 (2019)
Google Scholar
Ali, W., Ahmed, A.A.: Hybrid intelligent phishing website prediction using deep neural networks with genetic algorithm-based feature selection and weighting. IET Inf. Secur. 13, 659–669 (2019)
Article Google Scholar
Verma, R., Das, A.: What’s in a URL: fast feature extraction and malicious url detection. In: Proceedings of the 3rd ACM on International Workshop on Security and Privacy Analytics, pp. 55–63 (2017)
Google Scholar
Alqahtani, M.: Phishing websites classification using association classification (PWCAC). In: International Conference on Computer and Information Sciences (ICCIS), pp. 1–6. IEEE (2019)
Google Scholar
Balogun, A.O., et al.: Improving the phishing website detection using empirical analysis of Function Tree and its variants. Heliyon 7, e07437 (2021)
Google Scholar
Abdelhamid, N., Ayesh, A., Thabtah, F.: Phishing detection based associative classification data mining. Expert Syst. Appl. 41, 5948–5959 (2014)
Article Google Scholar
Dedakia, M., Mistry, K.: Phishing detection using content based associative classification data mining. J. Eng. Comput. Appl. Sci. 4, 209–214 (2015)
Google Scholar
Chandra, Y., Jana, A.: Improvement in phishing websites detection using meta classifiers. In: 6th International Conference on Computing for Sustainable Global Development (INDIACom), pp. 637–641. IEEE (2019)
Google Scholar
Hadi, W., Aburub, F., Alhawari, S.: A new fast associative classification algorithm for detecting phishing websites. Appl. Soft Comput. 48, 729–734 (2016)
Google Scholar
Rahman, S.S.M.M., Rafiq, F.B., Toma, T.R., Hossain, S.S., Biplob, K.B.B.: Performance assessment of multiple machine learning classifiers for detecting the phishing URLs. In: Raju, K.S., Senkerik, R., Lanka, S.P., Rajagopal, V. (eds.) Data Engineering and Communication Technology. AISC, vol. 1079, pp. 285–296. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-1097-7_25
Chapter Google Scholar
Alsariera, Y.A., Elijah, A.V., Balogun, A.O.: Phishing website detection: forest by penalizing attributes algorithm and its enhanced variations. Arab. J. Sci. Eng. 45(12), 10459–10470 (2020). https://doi.org/10.1007/s13369-020-04802-1
Article Google Scholar
Chiew, K.L., Tan, C.L., Wong, K., Yong, K.S., Tiong, W.K.: A new hybrid ensemble feature selection framework for machine learning-based phishing detection system. Inf. Sci. 484, 153–166 (2019)
Article Google Scholar
Aydin, M., Baykal, N.: Feature extraction and classification phishing websites based on URL. In: IEEE Conference on Communications and Network Security (CNS), pp. 769–770. IEEE (2015)
Google Scholar
Adeyemo, V.E., Balogun, A.O., Mojeed, H.A., Akande, N.O., Adewole, K.S.: Ensemble-based logistic model trees for website phishing detection. In: Anbar, M., Abdullah, N., Manickam, S. (eds.) ACeS 2020. CCIS, vol. 1347, pp. 627–641. Springer, Singapore (2021). https://doi.org/10.1007/978-981-33-6835-4_41
Chapter Google Scholar
Pham, B.T., Nguyen, V.-T., Ngo, V.-L., Trinh, P.T., Ngo, H.T.T., Tien Bui, D.: A novel hybrid model of rotation forest based functional trees for landslide susceptibility mapping: a case study at Kon Tum Province, Vietnam. In: Tien Bui, D., Ngoc Do, A., Bui, H.-B., Hoang, N.-D. (eds.) GTER 2017, pp. 186–201. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-68240-2_12
Chapter Google Scholar
Ubing, A.A., Jasmi, S.K.B., Abdullah, A., Jhanjhi, N., Supramaniam, M.: Phishing website detection: an improved accuracy through feature selection and ensemble learning. Int. J. Adv. Comput. Sci. Appl. 10, 252–257 (2019)
Google Scholar
Abdulrahaman, M.D., Alhassan, J.K., Adebayo, O.S., Ojeniyi, J.A., Olalere, M.: Phishing attack detection based on random forest with wrapper feature selection method. Int. J. Inf. Process. Commun. (IJIPC) 7, 209–224 (2019)
Google Scholar
Folorunso, S.O., Ayo, F.E., Abdullah, K.-K.A., Ogunyinka, P.I.: Hybrid vs ensemble classification models for phishing websites. Iraqi J. Sci. 3387–3396 (2020)
Google Scholar
Alsariera, Y.A., Adeyemo, V.E., Balogun, A.O., Alazzawi, A.K.: AI meta-learners and extra-trees algorithm for the detection of phishing websites. IEEE Access 8, 142532–142542 (2020)
Article Google Scholar
Ali, W., Malebary, S.: Particle swarm optimization-based feature weighting for improving intelligent phishing website detection. IEEE Access 8, 116766–116780 (2020)
Article Google Scholar
Balogun, A.O., Basri, S., Abdulkadir, S.J., Adeyemo, V.E., Imam, A.A., Bajeh, A.O.: Software defect prediction: analysis of class imbalance and performance stability. J. Eng. Sci. Technol 14, 3294–3308 (2019)
Google Scholar
Yu, Q., Jiang, S., Zhang, Y.: The performance stability of defect prediction models with class imbalance: an empirical study. IEICE Trans. Inf. Syst. 100, 265–272 (2017)
Article Google Scholar
Gama, J.: Functional trees. Mach. Learn. 55, 219–250 (2004)
Article Google Scholar
Witten, I.H., Frank, E.: Data mining: practical machine learning tools and techniques with Java implementations. ACM SIGMOD Rec. 31, 76–77 (2002)
Article Google Scholar
Gama, J., Brazdil, P.: Cascade generalization. Mach. Learn. 41, 315–343 (2000)
Article Google Scholar
Barakat, N.: Cascade generalization: one versus many. JCP 12, 238–249 (2017)
Article Google Scholar
Balogun, A.O., Bajeh, A.O., Orie, V.A., Yusuf-Asaju, W.A.: Software defect prediction using ensemble learning: an ANP based evaluation method. FUOYE J. Eng. Technol. 3, 50–55 (2018)
Article Google Scholar
Balogun, A.O., et al.: Empirical analysis of rank aggregation-based multi-filter feature selection methods in software defect prediction. Electronics 10, 179 (2021)
Article Google Scholar
Balogun, A.O., et al.: SMOTE-based homogeneous ensemble methods for software defect prediction. In: Gervasi, O., et al. (eds.) ICCSA 2020. LNCS, vol. 12254, pp. 615–631. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58817-5_45
Chapter Google Scholar
Jimoh, R., Balogun, A., Bajeh, A., Ajayi, S.: A PROMETHEE based evaluation of software defect predictors. J. Comput. Sci. Appl. 25, 106–119 (2018)
Google Scholar
Xu, Z., Liu, J., Yang, Z., An, G., Jia, X.: The impact of feature selection on defect prediction performance: an empirical comparison. In: IEEE 27th International Symposium on Software Reliability Engineering (ISSRE), pp. 309–320. IEEE (2016)
Google Scholar
Yadav, S., Shukla, S.: Analysis of k-fold cross-validation over hold-out validation on colossal datasets for quality classification. In: IEEE 6th International Conference on Advanced Computing (IACC), pp. 78–83. IEEE (2016)
Google Scholar
Arlot, S., Lerasle, M.: Choice of V for V-fold cross-validation in least-squares density estimation. J. Mach. Learn. Res. 17, 7256–7305 (2016)
MathSciNet MATH Google Scholar
Balogun, A.O., et al.: Search-based wrapper feature selection methods in software defect prediction: an empirical analysis. In: Silhavy, R. (ed.) CSOC 2020. AISC, vol. 1224, pp. 492–503. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-51965-0_43
Chapter Google Scholar
Balogun, A.O., et al.: Rank aggregation based multi-filter feature selection method for software defect prediction. In: Anbar, M., Abdullah, N., Manickam, S. (eds.) ACeS 2020. CCIS, vol. 1347, pp. 371–383. Springer, Singapore (2021). https://doi.org/10.1007/978-981-33-6835-4_25
Chapter Google Scholar
Balogun, A.O., Basri, S., Abdulkadir, S.J., Hashim, A.S.: Performance analysis of feature selection methods in software defect prediction: a search method approach. Appl. Sci. 9, 2764 (2019)
Article Google Scholar
Balogun, A.O., et al.: Impact of feature selection methods on the predictive performance of software defect prediction models: an extensive empirical study. Symmetry 12, 1147 (2020)
Article Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. ACM SIGKDD Explor. Newsl 11, 10–18 (2009)
Article Google Scholar
Adewole, K.S., Akintola, A.G., Salihu, S.A., Faruk, N., Jimoh, R.G.: Hybrid rule-based model for phishing URLs detection. In: Miraz, M.H., Excell, P.S., Ware, A., Soomro, S., Ali, M. (eds.) iCETiC 2019. LNICSSITE, vol. 285, pp. 119–135. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-23943-5_9
Chapter Google Scholar
AlEroud, A., Karabatis, G.: Bypassing detection of URL-based phishing attacks using generative adversarial deep neural networks. In: Proceedings of the 6th International Workshop on Security and Privacy Analytics, pp. 53–60 (2020)
Google Scholar
Al-Ahmadi, S., Lasloum, T.: PDMLP: phishing detection using multilayer perceptron. Int. J. Netw. Secur. Appl. 12, 59–72 (2020)
Google Scholar
Ferreira, R.P., et al.: Artificial neural network for websites classification with phishing characteristics. Soc. Netw. 7, 97 (2018)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Ilorin, PMB 1515, Ilorin, Nigeria
Abdullateef O. Balogun, Kayode S. Adewole, Amos O. Bajeh & Rasheed G. Jimoh
Department of Computer and Information Sciences, Universiti Teknologi PETRONAS, 32610, Bandar Seri Iskandar, Perak, Malaysia
Abdullateef O. Balogun

Authors

Abdullateef O. Balogun
View author publications
You can also search for this author in PubMed Google Scholar
Kayode S. Adewole
View author publications
You can also search for this author in PubMed Google Scholar
Amos O. Bajeh
View author publications
You can also search for this author in PubMed Google Scholar
Rasheed G. Jimoh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Abdullateef O. Balogun .

Editor information

Editors and Affiliations

Hodeidah University, Hodeidah, Yemen
Nibras Abdullah
Universiti Sains Malaysia, Penang, Malaysia
Selvakumar Manickam
Universiti Sains Malaysia, Penang, Malaysia
Mohammed Anbar

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Balogun, A.O., Adewole, K.S., Bajeh, A.O., Jimoh, R.G. (2021). Cascade Generalization Based Functional Tree for Website Phishing Detection. In: Abdullah, N., Manickam, S., Anbar, M. (eds) Advances in Cyber Security. ACeS 2021. Communications in Computer and Information Science, vol 1487. Springer, Singapore. https://doi.org/10.1007/978-981-16-8059-5_17

Download citation

DOI: https://doi.org/10.1007/978-981-16-8059-5_17
Published: 01 January 2022
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-8058-8
Online ISBN: 978-981-16-8059-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics