Skip to main content

Advertisement

Log in

Development of Proposed Model Using Random Forest with Optimization Technique for Classification of Phishing Website

  • Original Research
  • Published:
SN Computer Science Aims and scope Submit manuscript

Abstract

A phishing website is a fraudulent online platform intentionally created to mimic trustworthy websites to steal private and sensitive data from unwary users. The word “phishing” comes from the word “fishing,” whereby online thieves utilize fake websites as bait to trick people into giving up personal information like passwords, usernames, and bank account information. Phishing websites use social engineering techniques to generate a false sense of urgency or anxiety. They are characterized by a deceptive design that mimics genuine websites and URL manipulation through subtle misspellings or domain variations. Phishing attacks frequently start with false emails, messages, websites, or advertisements that contain links that take visitors to these hazardous websites. This research paper focuses on phishing website classification using machine learning based classification techniques with Particle Swarm Optimization (PSO) feature selection technique. We have used different classification techniques like K-Nearest Neighbours (K-NN), Gaussian Naïve Bayes (GNB), Support Vector Machine (SVM), Logistic Regression (LR), Decision Tree (DT), Random Forest (RF), Extreme Gradient Boosting (XGBoost), Adaptive Boosting (AdaBoost), and ensemble classifiers for classification of phishing websites. We have used PSO feature selection technique to reduce the features from phishing website dataset. The main aim of PSO feature selection technique is to computationally increase the performance of the model and improve the classification accuracy. We have also compared the performance measures of classifiers or models with and without feature selection technique where our proposed RF-PSO model achieves a better performance in terms of accuracy as 97.84%, Recall as 99.00%, and F1-score as 98.69% with 14 features and less computational time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Algorithm 1
Pseudocode 1
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data Availability Statement

The Phishing Website dataset is collected from University of California, Irvine (UCI) open-source machine learning data repository. Dataset Link- (https://archive.ics.uci.edu/ml/datasets/Phishing+Websites).

References

  1. Abutair HY, Belghith A. Using case-based reasoning for phishing detection. Procedia Comput Sci. 2017;109:281–8. https://doi.org/10.1016/j.procs.2017.05.352.

    Article  Google Scholar 

  2. Ali W, Ahmed AA. Hybrid intelligent phishing website prediction using deep neural networks with genetic algorithm-based feature selection and weighting. IET Inf Secur. 2019;13(6):659–69. https://doi.org/10.1049/iet-ifs.2019.0006.

    Article  Google Scholar 

  3. Ali W, Malebary S. Particle swarm optimization-based feature weighting for improving intelligent phishing website detection. IEEE Access. 2020;8:116766–80. https://doi.org/10.1109/ACCESS.2020.3003569.

    Article  Google Scholar 

  4. Alnemari S, Alshammari M. Detecting phishing domains using machine learning. Appl Sci. 2023;13(8):4649. https://doi.org/10.3390/app13084649. (pp.1-16).

    Article  Google Scholar 

  5. Alowaimer BH, Dahiya D. Performance investigation of phishing website detection by improved deep learning techniques. Wirel Person Commun. 2023. https://doi.org/10.1007/s11277-023-10736-2.

    Article  Google Scholar 

  6. Al-Sarem M, Saeed F, Al-Mekhlafi ZG, Mohammed BA, Al-Hadhrami T, Alshammari MT, et al. An optimized stacking ensemble model for phishing websites detection. Electronics. 2021;10(11):1285. https://doi.org/10.3390/electronics10111285. (pp. 1-18).

    Article  Google Scholar 

  7. Alsariera YA, Balogun AO, Adeyemo VE, Tarawneh OH, Mojeed HA. Intelligent tree-based ensemble approaches for phishing website detection. J Eng Sci Technol. 2022;17:563–82.

    Google Scholar 

  8. Alsariera YA, Elijah AV, Balogun AO. Phishing website detection: forest by penalizing attributes algorithm and its enhanced variations. Arab J Sci Eng. 2020;45:10459–70. https://doi.org/10.1007/s13369-020-04802-1.

    Article  Google Scholar 

  9. Anirudh S, Nishant PR, Baitha S, Kumar KD. An ensemble classification model for phishing mail detection. Procedia Comput Sci. 2024;233:970–8. https://doi.org/10.1016/j.procs.2024.03.286.

    Article  Google Scholar 

  10. APWG T. APWG. Phishing Activity Trends Reports, 2023. Retrieved from https://apwg.org/trendsreports/.

  11. Barot PA, Patel SA, Jethva HB. Evaluation of performance measures for reliable and secure phishing detection system. Reliabil Theory Appl. 2023;18(4(76)):861–70.

    Google Scholar 

  12. Breiman L. Random forests. Mach Learn. 2001;45:5–32. https://doi.org/10.1023/A:1010933404324. (Kluwer Academic Publishers, Manufactured in The Netherlands).

    Article  Google Scholar 

  13. Chawla A. Phishing website analysis and detection using machine learning. Int J Intell Syst Appl Eng. 2022;10(1):10–6.

    Article  Google Scholar 

  14. Davoudi MR, Yari AR. Improving the feature section method based on genetic algorithm to increase the efficiency of detecting phishing websites. Autom Control Comput Sci. 2023;57(3):213–21. https://doi.org/10.3103/S0146411623030045.

    Article  Google Scholar 

  15. Dharani M, Badkul S, Gharat K, Vidhate A and Bhosale D. Detection of phishing websites using ensemble machine learning approach. In ITM Web of Conference. 2021;40:1–5. EDP Sciences. https://doi.org/10.1051/itmconf/20214003012.

  16. Dutta AK. Detecting phishing websites using machine learning technique. PLoS ONE. 2021;16(10): e0258361. https://doi.org/10.1371/journal.pone.0258361. (pp. 1-17).

    Article  Google Scholar 

  17. Alpaydin E. Introduction to machine learning. MIT Press; 2014.

    Google Scholar 

  18. Elsheh MM and Swayeb K. Phishing website detection using a hybrid approach based on support vector machine and ant colony optimization. In: 2023 IEEE 3rd International Maghreb Meeting of the Conference on Sciences and Techniques of Automatic Control and Computer Engineering (MI-STA). 2023;402–6. IEEE. https://doi.org/10.1109/MI-STA57575.2023.10169464.

  19. Ghaleb al-Mekhlafi Z, Abdulkarem Mohammed B, Al-Sarem M, Saeed F, Al-Hadhrami T, Alshammari MT, Alreshidi A, Sarheed Alshammari T. Phishing websites detection by using optimized stacking ensemble model. Comput Syst Sci Eng. 2022;41(1):109–25. https://doi.org/10.32604/csse.2022.020414.

    Article  Google Scholar 

  20. Ghareeb S, Mahyoub M, and Mustafina J. Analysis of feature selection and phishing website classification using machine learning. In: 2023 15th International Conference on Developments in eSystems Engineering (DeSE), January, 2023;178–83. IEEE. https://doi.org/10.1109/DeSE58274.2023.10099697.

  21. Ghosh A, Kole A. A comparative study of enhanced machine learning algorithms for brain tumor detection and classification. Authorea Preprints. 2023. https://doi.org/10.36227/techrxiv.16863136.v1.

    Article  Google Scholar 

  22. Gountia D. reliability issues in state-of-the-art microfluidic biochips: a survey. IETE Tech Rev. 2023;40(5):694–709. https://doi.org/10.1080/02564602.2022.2158952.

    Article  Google Scholar 

  23. Ishwarya R, Muthumani S, PG S S K and Suriya S. Seperation of phishing emails using probabilistic classifiers. In: 2023 9th International Conference on Advanced Computing and Communication Systems (ICACCS), 2023;1:1676–9. IEEE. https://doi.org/10.1109/ICACCS57279.2023.10112826.

  24. Khan SA, Khan W and Hussain A. Phishing attacks and websites classification using machine learning and multiple datasets (a comparative analysis). In: Intelligent Computing Methodologies: 16th International Conference, ICIC 2020, Bari, Italy, October 2–5, 2020, Proceedings Springer International Publishing, 2020. Part III 16, pp. 301–13. https://doi.org/10.1007/978-3-030-60796-8_26

  25. Khonji M, Iraqi Y, Jones A. Phishing detection: a literature survey. IEEE Commun Surveys Tutor. 2013;15(4):2091–121. https://doi.org/10.1109/SURV.2013.032213.00009.

    Article  Google Scholar 

  26. Kocyigit E, Korkmaz M, Sahingoz OK, Diri B. Enhanced feature selection using genetic algorithm for machine-learning-based phishing URL detection. Appl Sci. 2024;14(14):6081. https://doi.org/10.3390/app1414608.

    Article  Google Scholar 

  27. McConnell B, Del Monaco D, Zabihimayvan M, Abdollahzadeh F, and Hamada S. Phishing attack detection: an improved performance through ensemble learning. In: International Conference on Artificial Intelligence and Soft Computing, Vol. 14126. Springer, Cham, 2023;145–57. https://doi.org/10.1007/978-3-031-42508-0_14.

  28. Mishra A, Gupta BB. Intelligent phishing detection system using similarity matching algorithms. Int J Inf Commun Technol. 2018;12(1–2):51–73. https://doi.org/10.1504/IJICT.2018.089022.

    Article  Google Scholar 

  29. Rami M and Lee M. Phishing Websites. UCI Machine Learning Repository, 2015. https://archive.ics.uci.edu/ml/datasets/Phishing+Websites. https://doi.org/10.24432/C51W2X.

  30. Nalini C, Kumari RS, Sudeeptha J. Comparative study on supervised machine learning algorithms for spam mail detection. Int J Sci Technol Res. 2020;9:850–3.

    Google Scholar 

  31. Ojewumi TO, Ogunleye GO, Oguntunde BO, Folorunsho O, Fashoto SG, Ogbu NJSA. Performance evaluation of machine learning tools for detection of phishing attacks on web pages. Sci Afr. 2022;16: e01165. https://doi.org/10.1016/j.sciaf.2022.e01165. (pp. 1-15).

    Article  Google Scholar 

  32. Patel D, Saxena AK, Laha S and Ansari GM. A novel scheme for feature selection using filter approach. In: 2022 7th International Conference on Computing. Communication and Security (ICCCS), 2022;1–4. IEEE. https://doi.org/10.1109/ICCCS55188.2022.10079604.

  33. Pathak P and Shrivas AK. Phishing website classification using machine learning techniques. National conference on Machine Learning, Deep Learning and IoT (NCMLDLIOT-2023), 2023. Vol. 1, pp. 83–96. ISBN No. 978-93-5768-638-9.

  34. Priya KS, Chandrika JB and Lakshmi MPP. Machine Learning-Based Phishing Website Detection A Comprehensive Approach for Cyber security. In: 2024 5th International Conference on Recent Trends in Computer Science and Technology (ICRTCST). 2024; pp. 344–9. IEEE. https://doi.org/10.1109/ICRTCST61793.2024.10578472.

  35. Qasim MAAAH, Flayh NA. Enhancing phishing website detection via feature selection in URL-based analysis. Informatica. 2023;47(9):145–56. https://doi.org/10.31449/inf.v47i9.5177.

    Article  Google Scholar 

  36. Qiu X, Zhang L, Ren Y, Suganthan PN and Amaratunga G. Ensemble deep learning for regression and time series forecasting. In: 2014 IEEE symposium on computational intelligence in ensemble learning (CIEL), 2014;1–6. IEEE. https://doi.org/10.1109/CIEL.2014.7015739.

  37. Rao RS, Ali ST. Phishshield: a desktop application to detect phishing webpages through heuristic approach. Procedia Comput Sci. 2015;54:147–56. https://doi.org/10.1016/j.procs.2015.06.017.

    Article  Google Scholar 

  38. Sahingoz OK, Baykal SI and Bulut D. Phishing detection from urls by using neural networks. Computer Science & Information Technology (CS & IT), 2018;41–54.

  39. Saravanan P, Subramanian S. A framework for detecting phishing websites using GA based feature selection and ARTMAP based website classification. Procedia Comput Sci. 2020;171:1083–92. https://doi.org/10.3103/S0146411623030045.

    Article  Google Scholar 

  40. Shabudin S, Sani NS, Ariffin KAZ, Aliff M. Feature selection for phishing website classification. Int J Adv Comput Sci Appl. 2020;11(4):588–95.

    Google Scholar 

  41. Sheikhi S, Kostakos PP. Safeguarding cyberspace: enhancing malicious website detection with PSO optimized XGBoost and firefly-based feature selection. Comput Secur. 2024;142: 103885. https://doi.org/10.1016/j.cose.2024.103885. (pp.1-11).

    Article  Google Scholar 

  42. Singh T, Kumar M and Kumar S. Enhancing phishing website detection using particle swarm optimization and feature selection techniques. In: 2023 IEEE World Conference on Applied Intelligence and Computing (AIC). 2023; pp. 977–982. IEEE. https://doi.org/10.1109/AIC57670.2023.10263814.

  43. Song XF, Zhang Y, Gong DW, Sun XY. Feature selection using bare-bones particle swarm optimization with mutual information. Pattern Recogn. 2021;112(107804):1–17. https://doi.org/10.1016/j.patcog.2020.107804.

    Article  Google Scholar 

  44. Subasi A, Kremic E. Comparison of adaboost with multiboosting for phishing website detection. Procedia Comput Sci. 2020;168:272–8. https://doi.org/10.1016/j.procs.2020.02.251.

    Article  Google Scholar 

  45. Subasi A, Molah E, Almkallawi F and Chaudhery TJ. Intelligent phishing website detection using random forest classifier. In: 2017 International conference on electrical and computing technologies and applications (ICECTA), 2017;1–5. IEEE. https://doi.org/10.1109/ICECTA.2017.8252051.

  46. Suleman MT, Awan SM. Optimization of URL-based phishing websites detection through genetic algorithms. Autom Control Comput Sci. 2019;53:333–41. https://doi.org/10.3103/S0146411619040102.

    Article  Google Scholar 

  47. Talukder AR, Alam F, Mim ST and Al Emon MA. Detecting phishing websites using naive bayes classification. In: 2024 3rd International conference on advancement in electrical and electronic engineering (ICAEEE), 2024;1–6. IEEE. https://doi.org/10.1109/ICAEEE62219.2024.10561829.

  48. Ubing AA, Jasmi SKB, Abdullah A, Jhanjhi NZ, Supramaniam M. Phishing website detection: an improved accuracy through feature selection and ensemble learning. Int J Adv Comput Sci Appl. 2019;10(1):252–7.

    Google Scholar 

  49. Zhou J, Cui H, Li X, Yang W, Wu X. A novel phishing website detection model based on LightGBM and domain name features. Symmetry. 2023;15(1):180. https://doi.org/10.3390/sym15010180. (pp. 1-15).

    Article  Google Scholar 

  50. Zhu E, Ju Y, Chen Z, Liu F, Fang X. DTOF-ANN: an artificial neural network phishing detection model based on decision tree and optimal features. Appl Soft Comput. 2020;95(106505):1–14. https://doi.org/10.1016/j.asoc.2020.106505.

    Article  Google Scholar 

Download references

Funding

Not applicable.

Author information

Authors and Affiliations

Authors

Contributions

Mr. Prakash Pathak has implemented the model with experimental work. He has also contributed in data preprocessing, data analysis and collecting the review paper. Dr. Akhilesh Kumar Shrivas has contributed in the designing of model and also contributed manuscript writing.

Corresponding author

Correspondence to Prakash Pathak.

Ethics declarations

Conflict of Interest

The proposed article has neither been published in any peer-reviewed journal nor under the consideration of any other journal. All the figures, tables, and texts are original and not copyrighted from any other article. No funding was received to assist with the preparation of this manuscript. The authors have no competing interests to declare that are relevant to the content of this article.

Informed Consent

We agree to consent to every piece of information related to this manuscript.

Research Involving Human and/or Animals

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pathak, P., Shrivas, A.K. Development of Proposed Model Using Random Forest with Optimization Technique for Classification of Phishing Website. SN COMPUT. SCI. 5, 1059 (2024). https://doi.org/10.1007/s42979-024-03388-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-024-03388-x

Keywords

Navigation