Abstract
Phishing is a cyber-attack which generates a fake website that imitates a trusted website to steal the sensitive information such as username, password, and credit card information. Despite the use of several anti-phishing approaches, online users are still getting trapped into revealing the sensitive information. Hence, in this paper, we propose an intelligent model with an ensemble of various feature selection techniques to detect phishing sites with a significant performance. We have used various machine learning algorithms for identifying the best classifier and developed an ensemble model with Random forest, Decision tree and XGBoost algorithms. We have also used various feature selection ensembles for the classification of phishing websites. From our experimental analysis, we achieved an accuracy of 97.51% in the detection process with dataset from UCI (Dataset 1) and also achieved an accuracy of 98.45% with phishing dataset for machine learning from Mendeley (Dataset 2). Also, the proposed model outperformed baseline models with a significant difference.

Similar content being viewed by others
References
AlShboul R, Thabtah F, Abdelhamid N, Al-Diabat M (2018) A visualization cybersecurity method based on features’ dissimilarity. Comput Secur 77:289–303
Babagoli M, Aghababa MP, Solouk V (2019) Heuristic nonlinear regression strategy for detecting phishing websites. Soft Comput 23:4315–4327
Chiew KL, Tan CL, Wong K, Yong KS, Tiong WK (2019) A new hybrid ensemble feature selection framework for machine learning-based phishing detection system. Inf Sci 484:153–166
El-Alfy E-SM (2017) Detection of phishing websites based on probabilistic neural networks and k-medoids clustering. Comput J 60:1745–1759
Feng F, Zhou Q, Shen Z, Yang X, Han L, Wang J (2018) The application of a novel neural network in the detection of phishing websites. J Amb Intell Hum Comput1–15
Karabatak M, Mustafa T (2018) Performance comparison of classifiers on reduced phishing website dataset. In: 2018 6th international symposium on digital forensic and security (ISDFS). IEEE, pp 1–5
Le H, Pham Q, Sahoo D, Hoi SC (2018) Urlnet: Learning a url representation with deep learning for malicious url detection. arXiv preprint arXiv:1802.03162,
Li Y, Yang Z, Chen X, Yuan H, Liu W (2019) A stacking model using url and html features for phishing webpage detection. Future Gener Comput Syst 94:27–39
Marchal S, Armano G, Gröndahl T, Saari K, Singh N, Asokan N (2017) Off-the-hook: an efficient and usable client-side phishing prevention application. IEEE Trans Comput 66:1717–1733
Prayogo RD, Karimah SA (2020) Optimization of phishing website classification based on synthetic minority oversampling technique and feature selection. In: 2020 international workshop on big data and information security (IWBIS). IEEE, pp 121–126
Rahman SSMM, Islam T, Jabiullah MI (2020) Phishstack: evaluation of stacked generalization in phishing urls detection. Procedia Comput Sci 167:2410–2418
Ramesh G, Gupta J, Gamya P (2017) Identification of phishing webpages and its target domains by analyzing the feign relationship. J Inf Secur Appl 35:75–84
Rao RS, Pais AR (2018) Detection of phishing websites using an efficient feature-based machine learning framework. Neural Comput Appl. https://doi.org/10.1007/s00521-017-3305-0
Rao RS, Pais AR (2019) Jail-phish: an improved search engine based phishing detection system. Comput Secury 83:246–267
Rao RS, Pais AR (2020) Two level filtering mechanism to detect phishing sites using lightweight visual similarity approach. J Ambient Intell Hum Comput 11:3853–3872
Rao RS, Pais AR, Anand P (2021) A heuristic technique to detect phishing websites using twsvm classifier. Neural Comput Appl 33:5733–5752
Rao RS, Tatti V, Pais AR (2020) Catchphish: Detection of phishing websites by inspecting urls. J Ambient Intell Hum Computing 11:1–15
Rao RS, Vaishnavi T, Pais AR (2019) Phishdump: a multi-model ensemble based technique for the detection of phishing sites in mobile devices. Pervasive Mobile Comput 60:101084
Sahingoz OK, Buber E, Demir O, Diri B (2018) Machine learning based phishing detection from URLS. Expert Syst Appl 117:345–357
Tan CL, Chiew KL, Wong K, Sze SN (2016) Phishwho: phishing webpage detection via identity keywords extraction and target domain name finder. Decision Supp Syst 88:18–27. https://doi.org/10.1016/j.dss.2016.05.005
Vaitkevicius P, Marcinkevicius V (2020) Comparison of classification algorithms for detection of phishing websites. Informatica 31:143–160
Varshney G, Misra M, Atrey PK (2016) A phish detector using lightweight search features. Comput Secur 62:213–228. https://doi.org/10.1016/j.cose.2016.08.003
Vrbančič G, Fister Jr I, Podgorelec V (2018) Swarm intelligence approaches for parameter setting of deep learning neural network: Case study on phishing websites classification. In: textitProceedings of the 8th international conference on web intelligence, mining and semantics, pp 1–8
Wang S, Khan S, Xu C, Nazir S, Hafeez A (2020) Deep learning-based efficient model development for phishing detection using random forest and BLSTM classifiers. Complexity. https://doi.org/10.1155/2020/8694796
Zabihimayvan M, Doran D (2019) Fuzzy rough set feature selection to enhance phishing attack detection. In: 2019 IEEE international conference on fuzzy systems (FUZZ-IEEE). IEEE, pp 1–6
Zamir A, Khan HU, Iqbal T, Yousaf N, Aslam F, Anjum A, Hamdani M (2020) Phishing web site detection using diverse machine learning algorithms. Electronic Libr 38(1):65–80. https://doi.org/10.1108/EL-05-2019-0118
Zhang W, Jiang Q, Chen L, Li C (2017) Two-stage elm for phishing web pages detection using hybrid features. World Wide Web 20:797–813
Zhang Y, Hong JI, Cranor LF (2007) Cantina: a content-based approach to detecting phishing web sites. In: Proceedings of the 16th international conference on world wide web, pp 639–648. ACM. http://dl.acm.org/citation.cfm?id=1242659. https://doi.org/10.1145/1242572.1242659
Zhu E, Ju Y, Chen Z, Liu F, Fang X (2020) Dtof-ann: an artificial neural network phishing detection model based on decision tree and optimal features. Appl Soft Comput 95:106505
Acknowledgements
The authors would like to thank Ministry of Electronics & Information Technology (Meity), Government of India for their support in part of the research.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ramana, A.V., Rao, K.L. & Rao, R.S. Stop-Phish: an intelligent phishing detection method using feature selection ensemble. Soc. Netw. Anal. Min. 11, 110 (2021). https://doi.org/10.1007/s13278-021-00829-w
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13278-021-00829-w