Abstract
Phishing is one of the most frequently occurring forms of cybercrime that Internet users face and represents a violation of cybersecurity principles. Phishing is a fraudulent attack that is performed over the Internet with the purpose of obtaining and using without authorization the sensitive information of Internet users, such as usernames, passwords, credit card details, and bank account information. Some widely used phishing attempts involve using email spoofing or instant messaging, aiming to convince a victim to visit the spoofed websites, which will result in obtaining the victim’s information. In this work, we identify and analyze the most important features needed to detect the spoofed websites in virtue of two new feature selection techniques. The first proposed feature selection technique uses underlying feature selection methods that vote on each feature, and if such methods agree on a specific feature, that feature is selected. The second feature selection technique also uses underlying feature selection methods that vote on each feature, and if the majority vote on a specific feature, the feature is selected. We also propose a phishing detection technique based on both AdaBoost and LightGBM ensemble methods to detect the spoofed websites. The proposed method achieves a very high accuracy compared to that of the existing methods.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Abutair H, Belghith A, AlAhmadi S (2019) Cbr-pds: a case-based reasoning phishing detection system. J Ambient Intell Hum Comput 10(7):2593–2606
Bahnsen AC, Bohorquez EC, Villegas S, Vargas J, González FA (2017) Classifying phishing urls using recurrent neural networks. In: 2017 APWG symposium on electronic crime research (eCrime), IEEE, pp 1–8
Basnet RB, Sung AH, Liu Q (2012) Feature selection for improved phishing detection. In: International Conference on Industrial. Springer, Engineering and Other Applications of Applied Intelligent Systems, pp 252–261
Chiew KL, Tan CL, Wong K, Yong KS, Tiong WK (2019) A new hybrid ensemble feature selection framework for machine learning-based phishing detection system. Inf Sci 484:153–166
Feng F, Zhou Q, Shen Z et al (2018) The application of a novel neural network in the detection of phishing websites. J Ambient Intell Hum Comput. https://doi.org/10.1007/s12652-018-0786-3
Freund Y, Schapire RE (1995) A desicion-theoretic generalization of on-line learning and an application to boosting. In: European conference on computational learning theory, Springer, pp 23–37
Freund Y, Schapire R, Abe N (1999) A short introduction to boosting. J Jpn Soc Artif Intell 14(771–780):1612
Jain AK, Gupta BB (2018) Two-level authentication approach to protect from phishing attacks in real time. J Ambient Intell Hum Comput 9(6):1783–1796
Jain AK, Gupta BB (2019) A machine learning based approach for phishing detection using hyperlinks information. J Ambient Intell Hum Comput 10(5):2015–2028
Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu TY (2017) Lightgbm: a highly efficient gradient boosting decision tree. In: Advances in neural information processing systems, pp 3146–3154
Khonji M, Jones A, Iraqi Y (2013) An empirical evaluation for feature selection methods in phishing email classification. Int J Comput Syst Sci Eng 28(1):37–51
Lastdrager EE (2014) Achieving a consensual definition of phishing based on a systematic review of the literature. Crime Sci 3(1):9
L’Huillier G, Hevia A, Weber R, Rios S (2010) Latent semantic analysis and keyword extraction for phishing classification. In: 2010 IEEE International Conference on intelligence and security informatics, IEEE, pp 129–131
Ma J, Saul LK, Savage S, Voelker GM (2009) Identifying suspicious urls: an application of large-scale online learning. In: Proceedings of the 26th annual international conference on machine learning, pp 681–688
Marchal S, François J, State R, Engel T (2014) Phishstorm: detecting phishing with streaming analytics. IEEE Trans Netw Serv Manag 11(4):458–471
Marchal S, Saari K, Singh N, Asokan N (2016) Know your phish: Novel techniques for detecting phishing sites and their targets. In: 2016 IEEE 36th International Conference on Distributed Computing Systems (ICDCS), IEEE, pp 323–333
McCall T (2007) Gartner survey shows phishing attacks escalated in 2007; more than $3 billion lost to these attacks. Gartner. http://www.gartner.com/it/page.jsp?id=565125
Mohammad RM, Thabtah F, McCluskey L (2014) Predicting phishing websites based on self-structuring neural network. Neural Comput Appl 25(2):443–458
Mohammad R, Thabtah FA, McCluskey T (2015a) Phishing websites dataset. University of Huddersfield, v1. https://archive.ics.uci.edu/ml/datasets/phishing+websites
Mohammad RM, Thabtah F, McCluskey L (2015b) Tutorial and critical analysis of phishing websites methods. Comput Sci Rev 17:1–24
Ramanathan V, Wechsler H (2013) Phishing detection and impersonated entity discovery using conditional random field and latent dirichlet allocation. Comput Secur 34:123–139
Rao RS, Pais AR (2019) Two level filtering mechanism to detect phishing sites using lightweight visual similarity approach. J Ambient Intell Hum Comput. https://doi.org/10.1007/s12652-019-01637-z
Rao RS, Vaishnavi T, Pais AR (2019) Phishdump: a multi-model ensemble based technique for the detection of phishing sites in mobile devices. Pervasive Mob Comput 60:101084
Rao RS, Vaishnavi T, Pais AR (2020) Catchphish: detection of phishing websites by inspecting urls. J Ambient Intell Hum Comput 11(2):813–825
Tan CL (2018) Phishing dataset for machine learning: feature evaluation. Mendeley, v1. https://doi.org/10.17632/h3cgnj8hft.1
Thakur T, Verma R (2014) Catching classical and hijack-based phishing attacks. In: International Conference on information systems security, Springer, pp 318–337
Toolan F, Carthy J (2010) Feature selection for spam and phishing detection. In: 2010 eCrime researchers summit. IEEE, pp 1–12. https://doi.org/10.1109/ecrime.2010.5706696
Varshney G, Misra M, Atrey PK (2016) A survey and classification of web phishing detection schemes. Secur Commun Netw 9(18):6266–6284
Verma R, Dyer K (2015) On the character of phishing urls: accurate and robust statistical learning classifiers. In: Proceedings of the 5th ACM Conference on data and application security and privacy, pp 111–122
Wang W, Zhang F, Luo X, Zhang S (2019) Pdrcnn: precise phishing detection with recurrent convolutional neural networks. Secur Commun Netw. https://doi.org/10.1155/2019/2595794
Xin Y, Kong L, Liu Z, Chen Y, Li Y, Zhu H, Gao M, Hou H, Wang C (2018) Machine learning and deep learning methods for cybersecurity. IEEE Access 6:35365–35381
Zabihimayvan M, Doran D (2019) Fuzzy rough set feature selection to enhance phishing attack detection. In: 2019 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), IEEE, pp 1–6
Zhu E, Chen Y, Ye C, Li X, Liu F (2019) Ofs-nn: an effective phishing websites detection model based on optimal feature selection and neural network. IEEE Access 7:73271–73284
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Alotaibi, B., Alotaibi, M. Consensus and majority vote feature selection methods and a detection technique for web phishing. J Ambient Intell Human Comput 12, 717–727 (2021). https://doi.org/10.1007/s12652-020-02054-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-020-02054-3