Skip to main content

Cascade Generalization Based Functional Tree for Website Phishing Detection

  • Conference paper
  • First Online:
Advances in Cyber Security (ACeS 2021)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1487))

Included in the following conference series:

Abstract

The advent of the web and internet space has seen its adoption for rendering various services -from financial to medical services. This has brought an increase in the rate of cybersecurity issues over the years and a prominent one is the phishing attack where malicious websites mimic the appearance and functionalities of another legitimate website to collect users’ credentials required for access to services. Several measures have been proposed to mitigate this attack; blacklisting and variants of machine learning approaches have been employed, yielding good performance results. However, there is a need to increase the rate of identification of phishing attacks and reduce the rate of false positives. This study proposes the use of a functional tree (FT) machine learning approach to mitigate phishing attacks. FT, a hybridization of multivariate decision trees and discriminant function using constructive induction, uses logistic regression for splitting tree nodes and leaf prediction, unlike the conventional decision tree that simply split nodes based on the data. Furthermore, a variant of the FT is proposed based on cascade generalization (CG-FT). Three datasets with varied instance distributions, both balanced and imbalanced, are used in the empirical investigation of the performance of the proposed CG-FT. The results showed that FT has improved performances over some selected baseline classifiers. Relative to FT, the CG-FT techniques showed improvement in the detection of a phishing attack with Area Under the Curve (AUC) and True Positive rate (TP-rate) ranging from 98–99.6% and 92–97% respectively in the datasets. Also, the false-positive rate is reduced with values ranging from 1.7 to 6.1%. The proposed CG-FT showed improvement over all the other reviewed approaches based on studied performance metrics. The use of FT and its hybridization with cascade generalization (CG-FT) showed an improvement in performance in the mitigation of phishing attacks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Mohammad, R.M., Thabtah, F., McCluskey, L.: Predicting phishing websites based on self-structuring neural network. Neural Comput. Appl. 25(2), 443–458 (2013). https://doi.org/10.1007/s00521-013-1490-z

    Article  Google Scholar 

  2. Vrbančič, G., Fister, I., Jr., Podgorelec, V.: Swarm intelligence approaches for parameter setting of deep learning neural network: case study on phishing websites classification. In: Proceedings of the 8th International Conference on Web Intelligence, Mining and Semantics, pp. 1–8 (2018)

    Google Scholar 

  3. Adeyemo, V.E., Azween, A., JhanJhi, N., Mahadevan, S., Balogun, A.O.: Ensemble and deep-learning methods for two-class and multi-attack anomaly intrusion detection: an empirical study. Int. J. Adv. Comput. Sci. Appl. 10, 520–528 (2019)

    Google Scholar 

  4. Ali, W., Ahmed, A.A.: Hybrid intelligent phishing website prediction using deep neural networks with genetic algorithm-based feature selection and weighting. IET Inf. Secur. 13, 659–669 (2019)

    Article  Google Scholar 

  5. Verma, R., Das, A.: What’s in a URL: fast feature extraction and malicious url detection. In: Proceedings of the 3rd ACM on International Workshop on Security and Privacy Analytics, pp. 55–63 (2017)

    Google Scholar 

  6. Alqahtani, M.: Phishing websites classification using association classification (PWCAC). In: International Conference on Computer and Information Sciences (ICCIS), pp. 1–6. IEEE (2019)

    Google Scholar 

  7. Balogun, A.O., et al.: Improving the phishing website detection using empirical analysis of Function Tree and its variants. Heliyon 7, e07437 (2021)

    Google Scholar 

  8. Abdelhamid, N., Ayesh, A., Thabtah, F.: Phishing detection based associative classification data mining. Expert Syst. Appl. 41, 5948–5959 (2014)

    Article  Google Scholar 

  9. Dedakia, M., Mistry, K.: Phishing detection using content based associative classification data mining. J. Eng. Comput. Appl. Sci. 4, 209–214 (2015)

    Google Scholar 

  10. Chandra, Y., Jana, A.: Improvement in phishing websites detection using meta classifiers. In: 6th International Conference on Computing for Sustainable Global Development (INDIACom), pp. 637–641. IEEE (2019)

    Google Scholar 

  11. Hadi, W., Aburub, F., Alhawari, S.: A new fast associative classification algorithm for detecting phishing websites. Appl. Soft Comput. 48, 729–734 (2016)

    Google Scholar 

  12. Rahman, S.S.M.M., Rafiq, F.B., Toma, T.R., Hossain, S.S., Biplob, K.B.B.: Performance assessment of multiple machine learning classifiers for detecting the phishing URLs. In: Raju, K.S., Senkerik, R., Lanka, S.P., Rajagopal, V. (eds.) Data Engineering and Communication Technology. AISC, vol. 1079, pp. 285–296. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-1097-7_25

    Chapter  Google Scholar 

  13. Alsariera, Y.A., Elijah, A.V., Balogun, A.O.: Phishing website detection: forest by penalizing attributes algorithm and its enhanced variations. Arab. J. Sci. Eng. 45(12), 10459–10470 (2020). https://doi.org/10.1007/s13369-020-04802-1

    Article  Google Scholar 

  14. Chiew, K.L., Tan, C.L., Wong, K., Yong, K.S., Tiong, W.K.: A new hybrid ensemble feature selection framework for machine learning-based phishing detection system. Inf. Sci. 484, 153–166 (2019)

    Article  Google Scholar 

  15. Aydin, M., Baykal, N.: Feature extraction and classification phishing websites based on URL. In: IEEE Conference on Communications and Network Security (CNS), pp. 769–770. IEEE (2015)

    Google Scholar 

  16. Adeyemo, V.E., Balogun, A.O., Mojeed, H.A., Akande, N.O., Adewole, K.S.: Ensemble-based logistic model trees for website phishing detection. In: Anbar, M., Abdullah, N., Manickam, S. (eds.) ACeS 2020. CCIS, vol. 1347, pp. 627–641. Springer, Singapore (2021). https://doi.org/10.1007/978-981-33-6835-4_41

    Chapter  Google Scholar 

  17. Pham, B.T., Nguyen, V.-T., Ngo, V.-L., Trinh, P.T., Ngo, H.T.T., Tien Bui, D.: A novel hybrid model of rotation forest based functional trees for landslide susceptibility mapping: a case study at Kon Tum Province, Vietnam. In: Tien Bui, D., Ngoc Do, A., Bui, H.-B., Hoang, N.-D. (eds.) GTER 2017, pp. 186–201. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-68240-2_12

    Chapter  Google Scholar 

  18. Ubing, A.A., Jasmi, S.K.B., Abdullah, A., Jhanjhi, N., Supramaniam, M.: Phishing website detection: an improved accuracy through feature selection and ensemble learning. Int. J. Adv. Comput. Sci. Appl. 10, 252–257 (2019)

    Google Scholar 

  19. Abdulrahaman, M.D., Alhassan, J.K., Adebayo, O.S., Ojeniyi, J.A., Olalere, M.: Phishing attack detection based on random forest with wrapper feature selection method. Int. J. Inf. Process. Commun. (IJIPC) 7, 209–224 (2019)

    Google Scholar 

  20. Folorunso, S.O., Ayo, F.E., Abdullah, K.-K.A., Ogunyinka, P.I.: Hybrid vs ensemble classification models for phishing websites. Iraqi J. Sci. 3387–3396 (2020)

    Google Scholar 

  21. Alsariera, Y.A., Adeyemo, V.E., Balogun, A.O., Alazzawi, A.K.: AI meta-learners and extra-trees algorithm for the detection of phishing websites. IEEE Access 8, 142532–142542 (2020)

    Article  Google Scholar 

  22. Ali, W., Malebary, S.: Particle swarm optimization-based feature weighting for improving intelligent phishing website detection. IEEE Access 8, 116766–116780 (2020)

    Article  Google Scholar 

  23. Balogun, A.O., Basri, S., Abdulkadir, S.J., Adeyemo, V.E., Imam, A.A., Bajeh, A.O.: Software defect prediction: analysis of class imbalance and performance stability. J. Eng. Sci. Technol 14, 3294–3308 (2019)

    Google Scholar 

  24. Yu, Q., Jiang, S., Zhang, Y.: The performance stability of defect prediction models with class imbalance: an empirical study. IEICE Trans. Inf. Syst. 100, 265–272 (2017)

    Article  Google Scholar 

  25. Gama, J.: Functional trees. Mach. Learn. 55, 219–250 (2004)

    Article  Google Scholar 

  26. Witten, I.H., Frank, E.: Data mining: practical machine learning tools and techniques with Java implementations. ACM SIGMOD Rec. 31, 76–77 (2002)

    Article  Google Scholar 

  27. Gama, J., Brazdil, P.: Cascade generalization. Mach. Learn. 41, 315–343 (2000)

    Article  Google Scholar 

  28. Barakat, N.: Cascade generalization: one versus many. JCP 12, 238–249 (2017)

    Article  Google Scholar 

  29. Balogun, A.O., Bajeh, A.O., Orie, V.A., Yusuf-Asaju, W.A.: Software defect prediction using ensemble learning: an ANP based evaluation method. FUOYE J. Eng. Technol. 3, 50–55 (2018)

    Article  Google Scholar 

  30. Balogun, A.O., et al.: Empirical analysis of rank aggregation-based multi-filter feature selection methods in software defect prediction. Electronics 10, 179 (2021)

    Article  Google Scholar 

  31. Balogun, A.O., et al.: SMOTE-based homogeneous ensemble methods for software defect prediction. In: Gervasi, O., et al. (eds.) ICCSA 2020. LNCS, vol. 12254, pp. 615–631. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58817-5_45

    Chapter  Google Scholar 

  32. Jimoh, R., Balogun, A., Bajeh, A., Ajayi, S.: A PROMETHEE based evaluation of software defect predictors. J. Comput. Sci. Appl. 25, 106–119 (2018)

    Google Scholar 

  33. Xu, Z., Liu, J., Yang, Z., An, G., Jia, X.: The impact of feature selection on defect prediction performance: an empirical comparison. In: IEEE 27th International Symposium on Software Reliability Engineering (ISSRE), pp. 309–320. IEEE (2016)

    Google Scholar 

  34. Yadav, S., Shukla, S.: Analysis of k-fold cross-validation over hold-out validation on colossal datasets for quality classification. In: IEEE 6th International Conference on Advanced Computing (IACC), pp. 78–83. IEEE (2016)

    Google Scholar 

  35. Arlot, S., Lerasle, M.: Choice of V for V-fold cross-validation in least-squares density estimation. J. Mach. Learn. Res. 17, 7256–7305 (2016)

    MathSciNet  MATH  Google Scholar 

  36. Balogun, A.O., et al.: Search-based wrapper feature selection methods in software defect prediction: an empirical analysis. In: Silhavy, R. (ed.) CSOC 2020. AISC, vol. 1224, pp. 492–503. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-51965-0_43

    Chapter  Google Scholar 

  37. Balogun, A.O., et al.: Rank aggregation based multi-filter feature selection method for software defect prediction. In: Anbar, M., Abdullah, N., Manickam, S. (eds.) ACeS 2020. CCIS, vol. 1347, pp. 371–383. Springer, Singapore (2021). https://doi.org/10.1007/978-981-33-6835-4_25

    Chapter  Google Scholar 

  38. Balogun, A.O., Basri, S., Abdulkadir, S.J., Hashim, A.S.: Performance analysis of feature selection methods in software defect prediction: a search method approach. Appl. Sci. 9, 2764 (2019)

    Article  Google Scholar 

  39. Balogun, A.O., et al.: Impact of feature selection methods on the predictive performance of software defect prediction models: an extensive empirical study. Symmetry 12, 1147 (2020)

    Article  Google Scholar 

  40. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. ACM SIGKDD Explor. Newsl 11, 10–18 (2009)

    Article  Google Scholar 

  41. Adewole, K.S., Akintola, A.G., Salihu, S.A., Faruk, N., Jimoh, R.G.: Hybrid rule-based model for phishing URLs detection. In: Miraz, M.H., Excell, P.S., Ware, A., Soomro, S., Ali, M. (eds.) iCETiC 2019. LNICSSITE, vol. 285, pp. 119–135. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-23943-5_9

    Chapter  Google Scholar 

  42. AlEroud, A., Karabatis, G.: Bypassing detection of URL-based phishing attacks using generative adversarial deep neural networks. In: Proceedings of the 6th International Workshop on Security and Privacy Analytics, pp. 53–60 (2020)

    Google Scholar 

  43. Al-Ahmadi, S., Lasloum, T.: PDMLP: phishing detection using multilayer perceptron. Int. J. Netw. Secur. Appl. 12, 59–72 (2020)

    Google Scholar 

  44. Ferreira, R.P., et al.: Artificial neural network for websites classification with phishing characteristics. Soc. Netw. 7, 97 (2018)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Abdullateef O. Balogun .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Balogun, A.O., Adewole, K.S., Bajeh, A.O., Jimoh, R.G. (2021). Cascade Generalization Based Functional Tree for Website Phishing Detection. In: Abdullah, N., Manickam, S., Anbar, M. (eds) Advances in Cyber Security. ACeS 2021. Communications in Computer and Information Science, vol 1487. Springer, Singapore. https://doi.org/10.1007/978-981-16-8059-5_17

Download citation

  • DOI: https://doi.org/10.1007/978-981-16-8059-5_17

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-16-8058-8

  • Online ISBN: 978-981-16-8059-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics