Skip to main content
Log in

Heuristic nonlinear regression strategy for detecting phishing websites

  • Methodologies and Application
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

In this paper, we propose a method of phishing website detection that utilizes a meta-heuristic-based nonlinear regression algorithm together with a feature selection approach. In order to validate the proposed method, we used a dataset comprised of 11055 phishing and legitimate webpages, and select 20 features to be extracted from the mentioned websites. This research utilizes two feature selection methods: decision tree and wrapper to select the best feature subset, while the latter incurred the detection accuracy rate as high as 96.32%. After the feature selection process, two meta-heuristic algorithms are successfully implemented to predict and detect the fraudulent websites: harmony search (HS) which was deployed based on nonlinear regression technique and support vector machine (SVM). The nonlinear regression approach was used to classify the websites, where the parameters of the proposed regression model were obtained using HS algorithm. The proposed HS algorithm uses dynamic pitch adjustment rate and generated new harmony. The nonlinear regression based on HS led to accuracy rates of 94.13 and 92.80% for train and test processes, respectively. As a result, the study finds that the nonlinear regression-based HS results in better performance compared to SVM.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  • Abdelhamid N, Ayesh A, Thabtah F (2014) Phishing detection based associative classification data mining. Expert Syst Appl 41:5948–5959

    Article  Google Scholar 

  • Aburrous M, Hossain MA, Thabatah F, Dahal K (2008) Intelligent phishing website detection system using fuzzy techniques. In: 3rd international conference on information and communication technologies: from theory to applications. ICTTA 2008. IEEE, pp 1–6

  • Aburrous M, Hossain MA, Dahal K, Thabtah F (2010) Intelligent phishing detection system for e-banking using fuzzy data mining. Expert Syst Appl 37:7913–7921

    Article  Google Scholar 

  • Ameli K, Alfi A, Aghaebrahimi M (2016) A fuzzy discrete harmony search algorithm applied to annual cost reduction in radial distribution systems. Eng Optim 48:1529–1549

    Article  Google Scholar 

  • Basnet R, Mukkamala S, Sung AH (2008) Detection of phishing attacks: a machine learning approach. In: Soft computing applications in industry. Springer, pp 373–383

  • Bottazzi G, Casalicchio E, Cingolani D, Marturana F, Piu M (2015) MP-Shield: a framework for phishing detection in mobile devices. In: 2015 IEEE international conference on computer and information technology; ubiquitous computing and communications; dependable, autonomic and secure computing; pervasive intelligence and computing (CIT/IUCC/DASC/PICOM). IEEE, pp 1977–1983

  • Cai C, Han L, Ji ZL, Chen X, Chen YZ (2003) SVM-Prot: web-based support vector machine software for functional classification of a protein from its primary sequence. Nucleic Acids Res 31:3692–3697

    Article  Google Scholar 

  • Cao J, Li Q, Ji Y, He Y, Guo D (2016) Detection of forwarding-based malicious URLs in online social networks. Int J Parallel Prog 44:163–180

    Article  Google Scholar 

  • Fil BA, Korkmaz M, Özmetin C (2016) Application of nonlinear regression analysis for methyl violet (MV) dye adsorption from solutions onto illite clay. J Dispers Sci Technol 37:991–1001

    Article  Google Scholar 

  • Gupta R, Shukla PK (2015) System design, investigation and countermeasure of phishing attacks using data mining classification methods and its analysis. Int J Adv Sci Technol 78:29–40

    Article  Google Scholar 

  • Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. ACM SIGKDD Explor Newsl 11:10–18

    Article  Google Scholar 

  • Hamid IRA, Abawajy J (2011) Phishing email feature selection approach. In: 2011 IEEE 10th international conference on trust, security and privacy in computing and communications. IEEE, pp 916–921

  • He Y-L, Wang X-Z, Huang JZ (2016) Fuzzy nonlinear regression analysis using a random weight network. Inf Sci 364:222–240

    Article  Google Scholar 

  • Jahn J (2017) Karush–Kuhn–Tucker conditions in set optimization. J Optim Theory Appl 172:707–725

    Article  MathSciNet  MATH  Google Scholar 

  • Jeong SY, Koh YS, Dobbie G (2016) Phishing detection on twitter streams. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, pp 141–153

  • Kalivarapu J, Jain S, Bag S (2016) An improved harmony search algorithm with dynamically varying bandwidth. Eng Optim 48:1091–1108

    Article  MathSciNet  Google Scholar 

  • Lee KS, Geem ZW (2005) A new meta-heuristic algorithm for continuous engineering optimization: harmony search theory and practice. Comput Methods Appl Mech Eng 194:3902–3933

    Article  MATH  Google Scholar 

  • Li K, Wang F, Zhang L (2016) A new algorithm for image recognition and classification based on improved Bag of Features algorithm. Opt Int J Light Electron Opt 127:4736–4740

    Article  Google Scholar 

  • Manjarres D, Landa-Torres I, Gil-Lopez S, Del Ser J, Bilbao MN, Salcedo-Sanz S, Geem ZW (2013) A survey on applications of the harmony search algorithm. Eng Appl Artif Intell 26:1818–1831

    Article  Google Scholar 

  • Mohammad RM, Thabtah F, McCluskey L (2012) An assessment of features related to phishing websites using an automated technique. In: 2012 international conference for internet technology and secured transactions. IEEE, pp 492–497

  • Mohammad RM, Thabtah F, McCluskey L (2014a) Intelligent rule-based phishing websites classification. IET Inf Secur 8:153–160

    Article  Google Scholar 

  • Mohammad RM, Thabtah F, McCluskey L (2014b) Predicting phishing websites based on self-structuring neural network. Neural Comput Appl 25:443–458

    Article  Google Scholar 

  • Mohammad R, Thabtah FA, McCluskey T (2015) Phishing websites Dataset

  • Montazer GA, ArabYarmohammadi S (2013) Identifying the critical indicators for phishing detection in Iranian e-banking system. In: 2013 5th conference on information and knowledge technology (IKT). IEEE, pp 107–112

  • Naik B, Nayak J, Behera HS, Abraham A (2016) A self adaptive harmony search based functional link higher order ANN for non-linear data classification. Neurocomputing 179:69–87

    Article  Google Scholar 

  • Pandey M, Ravi V (2012) Detecting phishing e-mails using text and data mining. In: 2012 IEEE international conference on computational intelligence & computing research (ICCIC). IEEE, pp 1–6

  • Qiu J, Wei Y, Karimi HR, Gao H (2017a) Reliable control of discrete-time piecewise-affine time-delay systems via output feedback. IEEE Trans Reliab 99:1–13

    Google Scholar 

  • Qiu J, Wei Y, Wu L (2017b) A novel approach to reliable control of piecewise affine systems with actuator faults. IEEE Trans Circuits Syst II Express Briefs 64:957–961

    Article  Google Scholar 

  • Rodrigues D, Pereira LA, Nakamura RY, Costa KA, Yang X-S, Souza AN, Papa JP (2014) A wrapper approach for feature selection based on bat algorithm and optimum-path forest. Expert Syst Appl 41:2250–2258

    Article  Google Scholar 

  • Satapathy SC, Chittineni S, Krishna SM, Murthy J, Reddy PP (2012) Kalman particle swarm optimized polynomials for data classification. Appl Math Model 36:115–126

    Article  MathSciNet  MATH  Google Scholar 

  • Song Q, Jiang H, Liu J (2017) Feature selection based on FDA and F-score for multi-class classification. Expert Syst Appl 81:22–27

    Article  Google Scholar 

  • Wang L, Ni H, Yang R, Pappu V, Fenn MB, Pardalos PM (2014) Feature selection based on meta-heuristics for biomedicine. Optim Methods Softw 29:703–719

    Article  MathSciNet  MATH  Google Scholar 

  • Wang G-G, Gandomi AH, Zhao X, Chu HCE (2016) Hybridizing harmony search algorithm with cuckoo search for global numerical optimization. Soft Comput 20:273–285

    Article  Google Scholar 

  • Wei Y, Qiu J, Karimi HR (2017) Reliable output feedback control of discrete-time fuzzy affine systems with actuator faults. IEEE Trans Circuits Syst I Regul Pap 64:170–181

    Article  Google Scholar 

  • Xia Z, Wang X, Sun X, Liu Q, Xiong N (2016) Steganalysis of LSB matching using differences between nonadjacent pixels. Multimed Tools Appl 75:1947–1962

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammad Pourmahmood Aghababa.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Communicated by V. Loia.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Babagoli, M., Aghababa, M.P. & Solouk, V. Heuristic nonlinear regression strategy for detecting phishing websites. Soft Comput 23, 4315–4327 (2019). https://doi.org/10.1007/s00500-018-3084-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-018-3084-2

Keywords

Navigation