Skip to main content

Advertisement

Log in

CBR-PDS: a case-based reasoning phishing detection system

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Abstract

Phishing attacks have become the preferred vehicle to gather sensitive information as well as to deliver dangerous malware. So far, there is still no phishing detection system that can perfectly detect and progressively self adapt to differentiate between phishing and legitimate websites. This paper proposes the case-based reasoning Phishing detection system (CBR-PDS) that relies on previous cases to detect phishing attacks. CBR-PDS is highly adaptive and dynamic as it can adapt to detect new phishing attacks using rather a small dataset size in contrast to other machine learning techniques. CBR-PDS aims to improve the detection accuracy and the reliability of the results by identifying a set of discriminative features and discarding irrelevant features. CBR-PDS relies on a two stage hybrid procedure using Information gain and Genetic algorithms. The reduction of the data dimensionality amounts to an improved accuracy rate, yet it necessitates a reduced processing time. The CBR-PDS is tested using different scenarios on a various balanced datasets. The obtained performances clearly show the suitability of our proposed hybrid feature selection procedure as well as the efficiency of the proposed CBR-PDS system. The obtained accuracy rates exceed 95%. We also show that the integration of an Online Phishing Threats component into the CBR-PDS system improves further the accuracy rate. Finally, CRB-PDS performances are compared to those of several known competitive classifiers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Aamodt A, Plaza E (1994) Case-based reasoning: foundational issues, methodological variations, and system approaches. AI Commun 7:39–59

    Google Scholar 

  • Abu-Nimeh S, Nappa D, Wang X, Nair S (2007) A comparison of machine learning techniques for phishing detection. In: Proceedings of the anti-phishing working group, 2nd annual eCrime researchers summit, pp 60–69

  • Abutair HYA, Belghith A (2017a) A multi-agent case-based reasoning architecture for phishing detection. Elsevier Procedia Comput Sci 110:492–497

    Article  Google Scholar 

  • Abutair HYA, Belghith A (2017b) Using case-based reasoning for phishing detection. Elsevier Procedia Comput Sci 109:281–288

    Article  Google Scholar 

  • Aitken S (2017) Aiai cbr shell. In: Artificial intelligence applications institute. http://www.aiai.ed.ac.uk/project/cbr/CBRDistrib/. Accessed 15 Oct 2017

  • Albitz P, Liu C (2009) DNS and BIND, 5th edn. O’Reilly Media, Newton

    MATH  Google Scholar 

  • Alhaj TA, Siraj A, Zainal, MM, Elshoush HT, Elhaj F (2017) Feature selection using information gain for improved structural-based alert correlation. PloS one 11:1–18

    Google Scholar 

  • Amiri I, Akanbi O, Fazeldehkordi E (2015) A machine-Learning approach to phishing detection and defense. Elsevier. ISBN: 978-0-12-802927-5. https://www.sciencedirect.com/science/book/9780128029275

  • Basnet R, Basnet R, Mukkamala S, Sung AH (2008) Detection of phishing attacks: a machine learning approach. Industry studies in fuzziness and soft computing, vol 226. Springer, Berlin

    Google Scholar 

  • Basnet RB, Doleck T (2015) Towards developing a tool to detect phishing urls: a machine learning approach. In: IEEE international conference on computational intelligence and communication technology (CICT’15), pp 220–223

  • Basnet RB, Sung AH, Liu Q (2012) Feature selection for improved phishing detection. IEA/AIE 2012. Lecture Notes in Computer Science, vol 7345. Springer, Berlin

    Google Scholar 

  • Bergmann R, Kolodner J, Plaza E (2005) Representation in case-based reasoning. Knowl Eng Rev 20:209–213

    Article  Google Scholar 

  • Breiman L (2001) Random forests. Mach Learn 45:5–32

    Article  MATH  Google Scholar 

  • Chaudhry JA, Chaudhry SA, Rittenhouse RG (2016) Phishing attacks and defenses. Int J Secur Appl 10:247–256

    Google Scholar 

  • Cohen WW (1995) Fast effective rule induction. In: Proceedings of the twelfth international conference on machine learning, pp 115–123, Morgan Kaufmann, Burlington

  • Cortes C, Vapnik V (1995) Support vector networks. Mach Learn 20:273–297

    MATH  Google Scholar 

  • Dunham K (2009) Mobile malware attacks and defense. Elsevier. ISBN: 978-1-59749-298-0. https://www.sciencedirect.com/science/book/9781597492980

  • Eiben AE, Smith JE (2010) Introduction to evolutionary computing (Natural Computing Series). Springer, Berlin

    MATH  Google Scholar 

  • Frank E, Witten IH (1998) Generating accurate rule sets without global optimization. In: Hamilton: computer science, University of Waikato

  • Ghamisi P, Benediktsson JA (2015) Feature selection based on hybridization of genetic algorithm and particle swarm optimization. IEEE Geosci Rem Sens Lett 12:309–313

    Article  Google Scholar 

  • Hall M et al (2009) The weka data mining software: an update. ACM SIGKDD Explor Newsl 11:10–18

    Article  Google Scholar 

  • Hewahi NM, Alashqar EA (2015) Wrapper feature selection based on genetic algorithm for recognizing objects from satellite imagery. J Inf Technol Res 8:1–20

    Article  Google Scholar 

  • Huang H, Qian L, Wang Y (2012) A svm-based technique to detect phishing urls. Inf Technol J 11:921–925

    Article  Google Scholar 

  • Jiang S, Chin KS, Wang L, Qu G, Tsui KL (2017) Modified genetic algorithm-based feature selection combined with pre-trained deep neural network for demand forecasting in outpatient department. Expert Syst Appl 82:216–230

    Article  Google Scholar 

  • Khonji M, Iraqi Y, Jones A (2013) Phishing detection: a literature survey. IEEE Commun Surv Tutor 15:2091–2121

    Article  Google Scholar 

  • Landwehr N, Hall M, Frank E (2005) Logistic model trees. Mach Learn 59:161–205

    Article  MATH  Google Scholar 

  • Liu G, Qiu B, Wenyin L (2010) Automatic detection of phishing target from phishing webpage. In: Proceedings of the 20th international conference on pattern recognition (ICPR’10), pp 4153–4156

  • Liu Z, Wang R, Tao M (2016) Smoteadanl: a learning method for network traffic classification. J Ambient Intell Hum Comput 7:121–130

    Article  Google Scholar 

  • Marchal S (2015) Analyse du dns et analyse smantique pour la dtection de l’hameonnage. Ph.D. Dissertation, pp 1–5, University of Lorraine, France

  • Marchal S, Franois J, State R, Engel T (2014a) Phishscore: Hacking phishers’ minds. In: Proceedings of the international conference on network and service management (CNSM’14), pp 46–54

  • Marchal S, Franois J, State R, EngelMoghimi T (2014b) Phishstorm: detecting phishing with streaming analytics. IEEE Trans Netw Serv Manag 11:458–471

    Article  Google Scholar 

  • Miyamoto D, Hazeyama H, Kadobayashi Y (2009) An evaluation of machine learning-based methods for detection of phishing sites. In: Kppen M, Kasabov N, Coghill G (eds) Advances in neuro-information processing. ICONIP 2008. Lecture Notes in Computer Science, vol 5506. Springer, Berlin, pp 539–546

    Google Scholar 

  • Moghimi M, Varjani AY (2016) New rule-based phishing detection method. Expert Syst Appl 53:231–242

    Article  Google Scholar 

  • Murphy C, Kaiser GE (2008) Improving the dependability of machine learning applications. In: Research Report, Department of Computer Science, Columbia University, NY, USA

  • Novakovic J (2016) Toward optimal feature selection using ranking methods and classification algorithms. Yugoslav J Oper Res 21(1)

    Google Scholar 

  • Obitko M (2017) Introduction to genetic algorithms. In: http://obitko.com/tutorials/genetic-algorithms/. Accessed 15 Oct 2017

  • Pradeepth KI, Kannan A (2009) Performance study of classification techniques for phishing url detection. In: Kppen M, Kasabov N, Coghill G (eds) Advances in neuro-information processing. ICONIP 2008. Lecture Notes in Computer Science, vol 5506. Springer, Berlin, pp 539–546

    Google Scholar 

  • Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann, Burlington

    Google Scholar 

  • Reyes ER, Negny S, Robles GC, Le Lann JM (2015) Improvement of online adaptation knowledge acquisition and reuse in case-based reasoning: application to process engineering design. Eng Appl Artif Intell 41:1–16

    Article  Google Scholar 

  • Richter MM, Weber R (2013) Case-based reasoning: a textbook. Sringer, Berlin

    Book  Google Scholar 

  • Scrucca L (2016) Genetic algorithms for subset selection in model-based clustering. Springer, Berlin

    Book  Google Scholar 

  • Sumner M, Hall M, Frank E (2005) Greedy attribute selection. In: Jorge A, Torgo L, Brazdil P, Camacho R, Gama J (eds) PKDD’05. Lecture notes in computer science, vol 3721. Springer, Berlin, Heidelberg, 675–683, Morgan Kaufmann, Burlington

  • Tan CL, Chiew KL (2014) Phishing website detection using url-assisted brand name weighting system. In: Proceedings of the international symposium on intelligent signal processing and communication systems (ISPACS’14), pp 054–059

  • Xue B, Zhang M, Browne WN, Yao X (2016) A survey on evolutionary computation approaches to feature selection. IEEE Trans Evol Comput 20:606–626

    Article  Google Scholar 

  • Yala N, Fergani B, Fleury A (2017) Towards improving feature extraction and classification for activity recognition on streaming data. J Ambient Intell Hum Comput 8:177–189

    Article  Google Scholar 

  • Zuhair H, Selamat A, Salleh M (2015) Selection of robust feature subsets for phish webpage prediction using maximum relevance and minimum redundancy criterion. J Theor Appl Inf Technol 81(2):188–205

    Google Scholar 

Download references

Acknowledgements

The authors extend their appreciation to the Deanship of Scientific Research at King Saud University for funding this work through research Group No. RG-1439-023.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Abdelfettah Belghith.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Abutair, H., Belghith, A. & AlAhmadi, S. CBR-PDS: a case-based reasoning phishing detection system. J Ambient Intell Human Comput 10, 2593–2606 (2019). https://doi.org/10.1007/s12652-018-0736-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12652-018-0736-0

Keywords

Navigation