Skip to main content
Log in

Stop-Phish: an intelligent phishing detection method using feature selection ensemble

  • Original Article
  • Published:
Social Network Analysis and Mining Aims and scope Submit manuscript

Abstract

Phishing is a cyber-attack which generates a fake website that imitates a trusted website to steal the sensitive information such as username, password, and credit card information. Despite the use of several anti-phishing approaches, online users are still getting trapped into revealing the sensitive information. Hence, in this paper, we propose an intelligent model with an ensemble of various feature selection techniques to detect phishing sites with a significant performance. We have used various machine learning algorithms for identifying the best classifier and developed an ensemble model with Random forest, Decision tree and XGBoost algorithms. We have also used various feature selection ensembles for the classification of phishing websites. From our experimental analysis, we achieved an accuracy of 97.51% in the detection process with dataset from UCI (Dataset 1) and also achieved an accuracy of 98.45% with phishing dataset for machine learning from Mendeley (Dataset 2). Also, the proposed model outperformed baseline models with a significant difference.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Notes

  1. https://securelist.com/spam-and-phishing-in-2018/89701/.

  2. https://docs.apwg.org//reports/apwg_trends_report_q4_2018.pdf.

  3. https://www.rsa.com/content/dam/en/white-paper/2019-current-state-of-cybercrime.pdf.

References

  • AlShboul R, Thabtah F, Abdelhamid N, Al-Diabat M (2018) A visualization cybersecurity method based on features’ dissimilarity. Comput Secur 77:289–303

  • Babagoli M, Aghababa MP, Solouk V (2019) Heuristic nonlinear regression strategy for detecting phishing websites. Soft Comput 23:4315–4327

    Article  Google Scholar 

  • Chiew KL, Tan CL, Wong K, Yong KS, Tiong WK (2019) A new hybrid ensemble feature selection framework for machine learning-based phishing detection system. Inf Sci 484:153–166

    Article  Google Scholar 

  • El-Alfy E-SM (2017) Detection of phishing websites based on probabilistic neural networks and k-medoids clustering. Comput J 60:1745–1759

    Article  Google Scholar 

  • Feng F, Zhou Q, Shen Z, Yang X, Han L, Wang J (2018) The application of a novel neural network in the detection of phishing websites. J Amb Intell Hum Comput1–15

  • Karabatak M, Mustafa T (2018) Performance comparison of classifiers on reduced phishing website dataset. In: 2018 6th international symposium on digital forensic and security (ISDFS). IEEE, pp 1–5

  • Le H, Pham Q, Sahoo D, Hoi SC (2018) Urlnet: Learning a url representation with deep learning for malicious url detection. arXiv preprint arXiv:1802.03162,

  • Li Y, Yang Z, Chen X, Yuan H, Liu W (2019) A stacking model using url and html features for phishing webpage detection. Future Gener Comput Syst 94:27–39

    Article  Google Scholar 

  • Marchal S, Armano G, Gröndahl T, Saari K, Singh N, Asokan N (2017) Off-the-hook: an efficient and usable client-side phishing prevention application. IEEE Trans Comput 66:1717–1733

    Article  MathSciNet  Google Scholar 

  • Prayogo RD, Karimah SA (2020) Optimization of phishing website classification based on synthetic minority oversampling technique and feature selection. In: 2020 international workshop on big data and information security (IWBIS). IEEE, pp 121–126

  • Rahman SSMM, Islam T, Jabiullah MI (2020) Phishstack: evaluation of stacked generalization in phishing urls detection. Procedia Comput Sci 167:2410–2418

    Article  Google Scholar 

  • Ramesh G, Gupta J, Gamya P (2017) Identification of phishing webpages and its target domains by analyzing the feign relationship. J Inf Secur Appl 35:75–84

    Google Scholar 

  • Rao RS, Pais AR (2018) Detection of phishing websites using an efficient feature-based machine learning framework. Neural Comput Appl. https://doi.org/10.1007/s00521-017-3305-0

    Article  Google Scholar 

  • Rao RS, Pais AR (2019) Jail-phish: an improved search engine based phishing detection system. Comput Secury 83:246–267

    Article  Google Scholar 

  • Rao RS, Pais AR (2020) Two level filtering mechanism to detect phishing sites using lightweight visual similarity approach. J Ambient Intell Hum Comput 11:3853–3872

    Article  Google Scholar 

  • Rao RS, Pais AR, Anand P (2021) A heuristic technique to detect phishing websites using twsvm classifier. Neural Comput Appl 33:5733–5752

    Article  Google Scholar 

  • Rao RS, Tatti V, Pais AR (2020) Catchphish: Detection of phishing websites by inspecting urls. J Ambient Intell Hum Computing 11:1–15

    Google Scholar 

  • Rao RS, Vaishnavi T, Pais AR (2019) Phishdump: a multi-model ensemble based technique for the detection of phishing sites in mobile devices. Pervasive Mobile Comput 60:101084

    Article  Google Scholar 

  • Sahingoz OK, Buber E, Demir O, Diri B (2018) Machine learning based phishing detection from URLS. Expert Syst Appl 117:345–357

    Article  Google Scholar 

  • Tan CL, Chiew KL, Wong K, Sze SN (2016) Phishwho: phishing webpage detection via identity keywords extraction and target domain name finder. Decision Supp Syst 88:18–27. https://doi.org/10.1016/j.dss.2016.05.005

    Article  Google Scholar 

  • Vaitkevicius P, Marcinkevicius V (2020) Comparison of classification algorithms for detection of phishing websites. Informatica 31:143–160

    Article  MathSciNet  Google Scholar 

  • Varshney G, Misra M, Atrey PK (2016) A phish detector using lightweight search features. Comput Secur 62:213–228. https://doi.org/10.1016/j.cose.2016.08.003

    Article  Google Scholar 

  • Vrbančič G, Fister Jr I, Podgorelec V (2018) Swarm intelligence approaches for parameter setting of deep learning neural network: Case study on phishing websites classification. In: textitProceedings of the 8th international conference on web intelligence, mining and semantics, pp 1–8

  • Wang S, Khan S, Xu C, Nazir S, Hafeez A (2020) Deep learning-based efficient model development for phishing detection using random forest and BLSTM classifiers. Complexity. https://doi.org/10.1155/2020/8694796

    Article  Google Scholar 

  • Zabihimayvan M, Doran D (2019) Fuzzy rough set feature selection to enhance phishing attack detection. In: 2019 IEEE international conference on fuzzy systems (FUZZ-IEEE). IEEE, pp 1–6

  • Zamir A, Khan HU, Iqbal T, Yousaf N, Aslam F, Anjum A, Hamdani M (2020) Phishing web site detection using diverse machine learning algorithms. Electronic Libr 38(1):65–80. https://doi.org/10.1108/EL-05-2019-0118

    Article  Google Scholar 

  • Zhang W, Jiang Q, Chen L, Li C (2017) Two-stage elm for phishing web pages detection using hybrid features. World Wide Web 20:797–813

    Article  Google Scholar 

  • Zhang Y, Hong JI, Cranor LF (2007) Cantina: a content-based approach to detecting phishing web sites. In: Proceedings of the 16th international conference on world wide web, pp 639–648. ACM. http://dl.acm.org/citation.cfm?id=1242659. https://doi.org/10.1145/1242572.1242659

  • Zhu E, Ju Y, Chen Z, Liu F, Fang X (2020) Dtof-ann: an artificial neural network phishing detection model based on decision tree and optimal features. Appl Soft Comput 95:106505

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to thank Ministry of Electronics & Information Technology (Meity), Government of India for their support in part of the research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to A. V. Ramana.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ramana, A.V., Rao, K.L. & Rao, R.S. Stop-Phish: an intelligent phishing detection method using feature selection ensemble. Soc. Netw. Anal. Min. 11, 110 (2021). https://doi.org/10.1007/s13278-021-00829-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13278-021-00829-w

Keywords

Navigation