Abstract
In this paper, we propose a method of phishing website detection that utilizes a meta-heuristic-based nonlinear regression algorithm together with a feature selection approach. In order to validate the proposed method, we used a dataset comprised of 11055 phishing and legitimate webpages, and select 20 features to be extracted from the mentioned websites. This research utilizes two feature selection methods: decision tree and wrapper to select the best feature subset, while the latter incurred the detection accuracy rate as high as 96.32%. After the feature selection process, two meta-heuristic algorithms are successfully implemented to predict and detect the fraudulent websites: harmony search (HS) which was deployed based on nonlinear regression technique and support vector machine (SVM). The nonlinear regression approach was used to classify the websites, where the parameters of the proposed regression model were obtained using HS algorithm. The proposed HS algorithm uses dynamic pitch adjustment rate and generated new harmony. The nonlinear regression based on HS led to accuracy rates of 94.13 and 92.80% for train and test processes, respectively. As a result, the study finds that the nonlinear regression-based HS results in better performance compared to SVM.










Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Abdelhamid N, Ayesh A, Thabtah F (2014) Phishing detection based associative classification data mining. Expert Syst Appl 41:5948–5959
Aburrous M, Hossain MA, Thabatah F, Dahal K (2008) Intelligent phishing website detection system using fuzzy techniques. In: 3rd international conference on information and communication technologies: from theory to applications. ICTTA 2008. IEEE, pp 1–6
Aburrous M, Hossain MA, Dahal K, Thabtah F (2010) Intelligent phishing detection system for e-banking using fuzzy data mining. Expert Syst Appl 37:7913–7921
Ameli K, Alfi A, Aghaebrahimi M (2016) A fuzzy discrete harmony search algorithm applied to annual cost reduction in radial distribution systems. Eng Optim 48:1529–1549
Basnet R, Mukkamala S, Sung AH (2008) Detection of phishing attacks: a machine learning approach. In: Soft computing applications in industry. Springer, pp 373–383
Bottazzi G, Casalicchio E, Cingolani D, Marturana F, Piu M (2015) MP-Shield: a framework for phishing detection in mobile devices. In: 2015 IEEE international conference on computer and information technology; ubiquitous computing and communications; dependable, autonomic and secure computing; pervasive intelligence and computing (CIT/IUCC/DASC/PICOM). IEEE, pp 1977–1983
Cai C, Han L, Ji ZL, Chen X, Chen YZ (2003) SVM-Prot: web-based support vector machine software for functional classification of a protein from its primary sequence. Nucleic Acids Res 31:3692–3697
Cao J, Li Q, Ji Y, He Y, Guo D (2016) Detection of forwarding-based malicious URLs in online social networks. Int J Parallel Prog 44:163–180
Fil BA, Korkmaz M, Özmetin C (2016) Application of nonlinear regression analysis for methyl violet (MV) dye adsorption from solutions onto illite clay. J Dispers Sci Technol 37:991–1001
Gupta R, Shukla PK (2015) System design, investigation and countermeasure of phishing attacks using data mining classification methods and its analysis. Int J Adv Sci Technol 78:29–40
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. ACM SIGKDD Explor Newsl 11:10–18
Hamid IRA, Abawajy J (2011) Phishing email feature selection approach. In: 2011 IEEE 10th international conference on trust, security and privacy in computing and communications. IEEE, pp 916–921
He Y-L, Wang X-Z, Huang JZ (2016) Fuzzy nonlinear regression analysis using a random weight network. Inf Sci 364:222–240
Jahn J (2017) Karush–Kuhn–Tucker conditions in set optimization. J Optim Theory Appl 172:707–725
Jeong SY, Koh YS, Dobbie G (2016) Phishing detection on twitter streams. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, pp 141–153
Kalivarapu J, Jain S, Bag S (2016) An improved harmony search algorithm with dynamically varying bandwidth. Eng Optim 48:1091–1108
Lee KS, Geem ZW (2005) A new meta-heuristic algorithm for continuous engineering optimization: harmony search theory and practice. Comput Methods Appl Mech Eng 194:3902–3933
Li K, Wang F, Zhang L (2016) A new algorithm for image recognition and classification based on improved Bag of Features algorithm. Opt Int J Light Electron Opt 127:4736–4740
Manjarres D, Landa-Torres I, Gil-Lopez S, Del Ser J, Bilbao MN, Salcedo-Sanz S, Geem ZW (2013) A survey on applications of the harmony search algorithm. Eng Appl Artif Intell 26:1818–1831
Mohammad RM, Thabtah F, McCluskey L (2012) An assessment of features related to phishing websites using an automated technique. In: 2012 international conference for internet technology and secured transactions. IEEE, pp 492–497
Mohammad RM, Thabtah F, McCluskey L (2014a) Intelligent rule-based phishing websites classification. IET Inf Secur 8:153–160
Mohammad RM, Thabtah F, McCluskey L (2014b) Predicting phishing websites based on self-structuring neural network. Neural Comput Appl 25:443–458
Mohammad R, Thabtah FA, McCluskey T (2015) Phishing websites Dataset
Montazer GA, ArabYarmohammadi S (2013) Identifying the critical indicators for phishing detection in Iranian e-banking system. In: 2013 5th conference on information and knowledge technology (IKT). IEEE, pp 107–112
Naik B, Nayak J, Behera HS, Abraham A (2016) A self adaptive harmony search based functional link higher order ANN for non-linear data classification. Neurocomputing 179:69–87
Pandey M, Ravi V (2012) Detecting phishing e-mails using text and data mining. In: 2012 IEEE international conference on computational intelligence & computing research (ICCIC). IEEE, pp 1–6
Qiu J, Wei Y, Karimi HR, Gao H (2017a) Reliable control of discrete-time piecewise-affine time-delay systems via output feedback. IEEE Trans Reliab 99:1–13
Qiu J, Wei Y, Wu L (2017b) A novel approach to reliable control of piecewise affine systems with actuator faults. IEEE Trans Circuits Syst II Express Briefs 64:957–961
Rodrigues D, Pereira LA, Nakamura RY, Costa KA, Yang X-S, Souza AN, Papa JP (2014) A wrapper approach for feature selection based on bat algorithm and optimum-path forest. Expert Syst Appl 41:2250–2258
Satapathy SC, Chittineni S, Krishna SM, Murthy J, Reddy PP (2012) Kalman particle swarm optimized polynomials for data classification. Appl Math Model 36:115–126
Song Q, Jiang H, Liu J (2017) Feature selection based on FDA and F-score for multi-class classification. Expert Syst Appl 81:22–27
Wang L, Ni H, Yang R, Pappu V, Fenn MB, Pardalos PM (2014) Feature selection based on meta-heuristics for biomedicine. Optim Methods Softw 29:703–719
Wang G-G, Gandomi AH, Zhao X, Chu HCE (2016) Hybridizing harmony search algorithm with cuckoo search for global numerical optimization. Soft Comput 20:273–285
Wei Y, Qiu J, Karimi HR (2017) Reliable output feedback control of discrete-time fuzzy affine systems with actuator faults. IEEE Trans Circuits Syst I Regul Pap 64:170–181
Xia Z, Wang X, Sun X, Liu Q, Xiong N (2016) Steganalysis of LSB matching using differences between nonadjacent pixels. Multimed Tools Appl 75:1947–1962
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Communicated by V. Loia.
Rights and permissions
About this article
Cite this article
Babagoli, M., Aghababa, M.P. & Solouk, V. Heuristic nonlinear regression strategy for detecting phishing websites. Soft Comput 23, 4315–4327 (2019). https://doi.org/10.1007/s00500-018-3084-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-018-3084-2