Abstract
The advent of the web and internet space has seen its adoption for rendering various services -from financial to medical services. This has brought an increase in the rate of cybersecurity issues over the years and a prominent one is the phishing attack where malicious websites mimic the appearance and functionalities of another legitimate website to collect users’ credentials required for access to services. Several measures have been proposed to mitigate this attack; blacklisting and variants of machine learning approaches have been employed, yielding good performance results. However, there is a need to increase the rate of identification of phishing attacks and reduce the rate of false positives. This study proposes the use of a functional tree (FT) machine learning approach to mitigate phishing attacks. FT, a hybridization of multivariate decision trees and discriminant function using constructive induction, uses logistic regression for splitting tree nodes and leaf prediction, unlike the conventional decision tree that simply split nodes based on the data. Furthermore, a variant of the FT is proposed based on cascade generalization (CG-FT). Three datasets with varied instance distributions, both balanced and imbalanced, are used in the empirical investigation of the performance of the proposed CG-FT. The results showed that FT has improved performances over some selected baseline classifiers. Relative to FT, the CG-FT techniques showed improvement in the detection of a phishing attack with Area Under the Curve (AUC) and True Positive rate (TP-rate) ranging from 98–99.6% and 92–97% respectively in the datasets. Also, the false-positive rate is reduced with values ranging from 1.7 to 6.1%. The proposed CG-FT showed improvement over all the other reviewed approaches based on studied performance metrics. The use of FT and its hybridization with cascade generalization (CG-FT) showed an improvement in performance in the mitigation of phishing attacks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Mohammad, R.M., Thabtah, F., McCluskey, L.: Predicting phishing websites based on self-structuring neural network. Neural Comput. Appl. 25(2), 443–458 (2013). https://doi.org/10.1007/s00521-013-1490-z
Vrbančič, G., Fister, I., Jr., Podgorelec, V.: Swarm intelligence approaches for parameter setting of deep learning neural network: case study on phishing websites classification. In: Proceedings of the 8th International Conference on Web Intelligence, Mining and Semantics, pp. 1–8 (2018)
Adeyemo, V.E., Azween, A., JhanJhi, N., Mahadevan, S., Balogun, A.O.: Ensemble and deep-learning methods for two-class and multi-attack anomaly intrusion detection: an empirical study. Int. J. Adv. Comput. Sci. Appl. 10, 520–528 (2019)
Ali, W., Ahmed, A.A.: Hybrid intelligent phishing website prediction using deep neural networks with genetic algorithm-based feature selection and weighting. IET Inf. Secur. 13, 659–669 (2019)
Verma, R., Das, A.: What’s in a URL: fast feature extraction and malicious url detection. In: Proceedings of the 3rd ACM on International Workshop on Security and Privacy Analytics, pp. 55–63 (2017)
Alqahtani, M.: Phishing websites classification using association classification (PWCAC). In: International Conference on Computer and Information Sciences (ICCIS), pp. 1–6. IEEE (2019)
Balogun, A.O., et al.: Improving the phishing website detection using empirical analysis of Function Tree and its variants. Heliyon 7, e07437 (2021)
Abdelhamid, N., Ayesh, A., Thabtah, F.: Phishing detection based associative classification data mining. Expert Syst. Appl. 41, 5948–5959 (2014)
Dedakia, M., Mistry, K.: Phishing detection using content based associative classification data mining. J. Eng. Comput. Appl. Sci. 4, 209–214 (2015)
Chandra, Y., Jana, A.: Improvement in phishing websites detection using meta classifiers. In: 6th International Conference on Computing for Sustainable Global Development (INDIACom), pp. 637–641. IEEE (2019)
Hadi, W., Aburub, F., Alhawari, S.: A new fast associative classification algorithm for detecting phishing websites. Appl. Soft Comput. 48, 729–734 (2016)
Rahman, S.S.M.M., Rafiq, F.B., Toma, T.R., Hossain, S.S., Biplob, K.B.B.: Performance assessment of multiple machine learning classifiers for detecting the phishing URLs. In: Raju, K.S., Senkerik, R., Lanka, S.P., Rajagopal, V. (eds.) Data Engineering and Communication Technology. AISC, vol. 1079, pp. 285–296. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-1097-7_25
Alsariera, Y.A., Elijah, A.V., Balogun, A.O.: Phishing website detection: forest by penalizing attributes algorithm and its enhanced variations. Arab. J. Sci. Eng. 45(12), 10459–10470 (2020). https://doi.org/10.1007/s13369-020-04802-1
Chiew, K.L., Tan, C.L., Wong, K., Yong, K.S., Tiong, W.K.: A new hybrid ensemble feature selection framework for machine learning-based phishing detection system. Inf. Sci. 484, 153–166 (2019)
Aydin, M., Baykal, N.: Feature extraction and classification phishing websites based on URL. In: IEEE Conference on Communications and Network Security (CNS), pp. 769–770. IEEE (2015)
Adeyemo, V.E., Balogun, A.O., Mojeed, H.A., Akande, N.O., Adewole, K.S.: Ensemble-based logistic model trees for website phishing detection. In: Anbar, M., Abdullah, N., Manickam, S. (eds.) ACeS 2020. CCIS, vol. 1347, pp. 627–641. Springer, Singapore (2021). https://doi.org/10.1007/978-981-33-6835-4_41
Pham, B.T., Nguyen, V.-T., Ngo, V.-L., Trinh, P.T., Ngo, H.T.T., Tien Bui, D.: A novel hybrid model of rotation forest based functional trees for landslide susceptibility mapping: a case study at Kon Tum Province, Vietnam. In: Tien Bui, D., Ngoc Do, A., Bui, H.-B., Hoang, N.-D. (eds.) GTER 2017, pp. 186–201. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-68240-2_12
Ubing, A.A., Jasmi, S.K.B., Abdullah, A., Jhanjhi, N., Supramaniam, M.: Phishing website detection: an improved accuracy through feature selection and ensemble learning. Int. J. Adv. Comput. Sci. Appl. 10, 252–257 (2019)
Abdulrahaman, M.D., Alhassan, J.K., Adebayo, O.S., Ojeniyi, J.A., Olalere, M.: Phishing attack detection based on random forest with wrapper feature selection method. Int. J. Inf. Process. Commun. (IJIPC) 7, 209–224 (2019)
Folorunso, S.O., Ayo, F.E., Abdullah, K.-K.A., Ogunyinka, P.I.: Hybrid vs ensemble classification models for phishing websites. Iraqi J. Sci. 3387–3396 (2020)
Alsariera, Y.A., Adeyemo, V.E., Balogun, A.O., Alazzawi, A.K.: AI meta-learners and extra-trees algorithm for the detection of phishing websites. IEEE Access 8, 142532–142542 (2020)
Ali, W., Malebary, S.: Particle swarm optimization-based feature weighting for improving intelligent phishing website detection. IEEE Access 8, 116766–116780 (2020)
Balogun, A.O., Basri, S., Abdulkadir, S.J., Adeyemo, V.E., Imam, A.A., Bajeh, A.O.: Software defect prediction: analysis of class imbalance and performance stability. J. Eng. Sci. Technol 14, 3294–3308 (2019)
Yu, Q., Jiang, S., Zhang, Y.: The performance stability of defect prediction models with class imbalance: an empirical study. IEICE Trans. Inf. Syst. 100, 265–272 (2017)
Gama, J.: Functional trees. Mach. Learn. 55, 219–250 (2004)
Witten, I.H., Frank, E.: Data mining: practical machine learning tools and techniques with Java implementations. ACM SIGMOD Rec. 31, 76–77 (2002)
Gama, J., Brazdil, P.: Cascade generalization. Mach. Learn. 41, 315–343 (2000)
Barakat, N.: Cascade generalization: one versus many. JCP 12, 238–249 (2017)
Balogun, A.O., Bajeh, A.O., Orie, V.A., Yusuf-Asaju, W.A.: Software defect prediction using ensemble learning: an ANP based evaluation method. FUOYE J. Eng. Technol. 3, 50–55 (2018)
Balogun, A.O., et al.: Empirical analysis of rank aggregation-based multi-filter feature selection methods in software defect prediction. Electronics 10, 179 (2021)
Balogun, A.O., et al.: SMOTE-based homogeneous ensemble methods for software defect prediction. In: Gervasi, O., et al. (eds.) ICCSA 2020. LNCS, vol. 12254, pp. 615–631. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58817-5_45
Jimoh, R., Balogun, A., Bajeh, A., Ajayi, S.: A PROMETHEE based evaluation of software defect predictors. J. Comput. Sci. Appl. 25, 106–119 (2018)
Xu, Z., Liu, J., Yang, Z., An, G., Jia, X.: The impact of feature selection on defect prediction performance: an empirical comparison. In: IEEE 27th International Symposium on Software Reliability Engineering (ISSRE), pp. 309–320. IEEE (2016)
Yadav, S., Shukla, S.: Analysis of k-fold cross-validation over hold-out validation on colossal datasets for quality classification. In: IEEE 6th International Conference on Advanced Computing (IACC), pp. 78–83. IEEE (2016)
Arlot, S., Lerasle, M.: Choice of V for V-fold cross-validation in least-squares density estimation. J. Mach. Learn. Res. 17, 7256–7305 (2016)
Balogun, A.O., et al.: Search-based wrapper feature selection methods in software defect prediction: an empirical analysis. In: Silhavy, R. (ed.) CSOC 2020. AISC, vol. 1224, pp. 492–503. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-51965-0_43
Balogun, A.O., et al.: Rank aggregation based multi-filter feature selection method for software defect prediction. In: Anbar, M., Abdullah, N., Manickam, S. (eds.) ACeS 2020. CCIS, vol. 1347, pp. 371–383. Springer, Singapore (2021). https://doi.org/10.1007/978-981-33-6835-4_25
Balogun, A.O., Basri, S., Abdulkadir, S.J., Hashim, A.S.: Performance analysis of feature selection methods in software defect prediction: a search method approach. Appl. Sci. 9, 2764 (2019)
Balogun, A.O., et al.: Impact of feature selection methods on the predictive performance of software defect prediction models: an extensive empirical study. Symmetry 12, 1147 (2020)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. ACM SIGKDD Explor. Newsl 11, 10–18 (2009)
Adewole, K.S., Akintola, A.G., Salihu, S.A., Faruk, N., Jimoh, R.G.: Hybrid rule-based model for phishing URLs detection. In: Miraz, M.H., Excell, P.S., Ware, A., Soomro, S., Ali, M. (eds.) iCETiC 2019. LNICSSITE, vol. 285, pp. 119–135. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-23943-5_9
AlEroud, A., Karabatis, G.: Bypassing detection of URL-based phishing attacks using generative adversarial deep neural networks. In: Proceedings of the 6th International Workshop on Security and Privacy Analytics, pp. 53–60 (2020)
Al-Ahmadi, S., Lasloum, T.: PDMLP: phishing detection using multilayer perceptron. Int. J. Netw. Secur. Appl. 12, 59–72 (2020)
Ferreira, R.P., et al.: Artificial neural network for websites classification with phishing characteristics. Soc. Netw. 7, 97 (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Balogun, A.O., Adewole, K.S., Bajeh, A.O., Jimoh, R.G. (2021). Cascade Generalization Based Functional Tree for Website Phishing Detection. In: Abdullah, N., Manickam, S., Anbar, M. (eds) Advances in Cyber Security. ACeS 2021. Communications in Computer and Information Science, vol 1487. Springer, Singapore. https://doi.org/10.1007/978-981-16-8059-5_17
Download citation
DOI: https://doi.org/10.1007/978-981-16-8059-5_17
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-8058-8
Online ISBN: 978-981-16-8059-5
eBook Packages: Computer ScienceComputer Science (R0)