Abstract
Phishing attacks have become the preferred vehicle to gather sensitive information as well as to deliver dangerous malware. So far, there is still no phishing detection system that can perfectly detect and progressively self adapt to differentiate between phishing and legitimate websites. This paper proposes the case-based reasoning Phishing detection system (CBR-PDS) that relies on previous cases to detect phishing attacks. CBR-PDS is highly adaptive and dynamic as it can adapt to detect new phishing attacks using rather a small dataset size in contrast to other machine learning techniques. CBR-PDS aims to improve the detection accuracy and the reliability of the results by identifying a set of discriminative features and discarding irrelevant features. CBR-PDS relies on a two stage hybrid procedure using Information gain and Genetic algorithms. The reduction of the data dimensionality amounts to an improved accuracy rate, yet it necessitates a reduced processing time. The CBR-PDS is tested using different scenarios on a various balanced datasets. The obtained performances clearly show the suitability of our proposed hybrid feature selection procedure as well as the efficiency of the proposed CBR-PDS system. The obtained accuracy rates exceed 95%. We also show that the integration of an Online Phishing Threats component into the CBR-PDS system improves further the accuracy rate. Finally, CRB-PDS performances are compared to those of several known competitive classifiers.
Similar content being viewed by others
References
Aamodt A, Plaza E (1994) Case-based reasoning: foundational issues, methodological variations, and system approaches. AI Commun 7:39–59
Abu-Nimeh S, Nappa D, Wang X, Nair S (2007) A comparison of machine learning techniques for phishing detection. In: Proceedings of the anti-phishing working group, 2nd annual eCrime researchers summit, pp 60–69
Abutair HYA, Belghith A (2017a) A multi-agent case-based reasoning architecture for phishing detection. Elsevier Procedia Comput Sci 110:492–497
Abutair HYA, Belghith A (2017b) Using case-based reasoning for phishing detection. Elsevier Procedia Comput Sci 109:281–288
Aitken S (2017) Aiai cbr shell. In: Artificial intelligence applications institute. http://www.aiai.ed.ac.uk/project/cbr/CBRDistrib/. Accessed 15 Oct 2017
Albitz P, Liu C (2009) DNS and BIND, 5th edn. O’Reilly Media, Newton
Alhaj TA, Siraj A, Zainal, MM, Elshoush HT, Elhaj F (2017) Feature selection using information gain for improved structural-based alert correlation. PloS one 11:1–18
Amiri I, Akanbi O, Fazeldehkordi E (2015) A machine-Learning approach to phishing detection and defense. Elsevier. ISBN: 978-0-12-802927-5. https://www.sciencedirect.com/science/book/9780128029275
Basnet R, Basnet R, Mukkamala S, Sung AH (2008) Detection of phishing attacks: a machine learning approach. Industry studies in fuzziness and soft computing, vol 226. Springer, Berlin
Basnet RB, Doleck T (2015) Towards developing a tool to detect phishing urls: a machine learning approach. In: IEEE international conference on computational intelligence and communication technology (CICT’15), pp 220–223
Basnet RB, Sung AH, Liu Q (2012) Feature selection for improved phishing detection. IEA/AIE 2012. Lecture Notes in Computer Science, vol 7345. Springer, Berlin
Bergmann R, Kolodner J, Plaza E (2005) Representation in case-based reasoning. Knowl Eng Rev 20:209–213
Breiman L (2001) Random forests. Mach Learn 45:5–32
Chaudhry JA, Chaudhry SA, Rittenhouse RG (2016) Phishing attacks and defenses. Int J Secur Appl 10:247–256
Cohen WW (1995) Fast effective rule induction. In: Proceedings of the twelfth international conference on machine learning, pp 115–123, Morgan Kaufmann, Burlington
Cortes C, Vapnik V (1995) Support vector networks. Mach Learn 20:273–297
Dunham K (2009) Mobile malware attacks and defense. Elsevier. ISBN: 978-1-59749-298-0. https://www.sciencedirect.com/science/book/9781597492980
Eiben AE, Smith JE (2010) Introduction to evolutionary computing (Natural Computing Series). Springer, Berlin
Frank E, Witten IH (1998) Generating accurate rule sets without global optimization. In: Hamilton: computer science, University of Waikato
Ghamisi P, Benediktsson JA (2015) Feature selection based on hybridization of genetic algorithm and particle swarm optimization. IEEE Geosci Rem Sens Lett 12:309–313
Hall M et al (2009) The weka data mining software: an update. ACM SIGKDD Explor Newsl 11:10–18
Hewahi NM, Alashqar EA (2015) Wrapper feature selection based on genetic algorithm for recognizing objects from satellite imagery. J Inf Technol Res 8:1–20
Huang H, Qian L, Wang Y (2012) A svm-based technique to detect phishing urls. Inf Technol J 11:921–925
Jiang S, Chin KS, Wang L, Qu G, Tsui KL (2017) Modified genetic algorithm-based feature selection combined with pre-trained deep neural network for demand forecasting in outpatient department. Expert Syst Appl 82:216–230
Khonji M, Iraqi Y, Jones A (2013) Phishing detection: a literature survey. IEEE Commun Surv Tutor 15:2091–2121
Landwehr N, Hall M, Frank E (2005) Logistic model trees. Mach Learn 59:161–205
Liu G, Qiu B, Wenyin L (2010) Automatic detection of phishing target from phishing webpage. In: Proceedings of the 20th international conference on pattern recognition (ICPR’10), pp 4153–4156
Liu Z, Wang R, Tao M (2016) Smoteadanl: a learning method for network traffic classification. J Ambient Intell Hum Comput 7:121–130
Marchal S (2015) Analyse du dns et analyse smantique pour la dtection de l’hameonnage. Ph.D. Dissertation, pp 1–5, University of Lorraine, France
Marchal S, Franois J, State R, Engel T (2014a) Phishscore: Hacking phishers’ minds. In: Proceedings of the international conference on network and service management (CNSM’14), pp 46–54
Marchal S, Franois J, State R, EngelMoghimi T (2014b) Phishstorm: detecting phishing with streaming analytics. IEEE Trans Netw Serv Manag 11:458–471
Miyamoto D, Hazeyama H, Kadobayashi Y (2009) An evaluation of machine learning-based methods for detection of phishing sites. In: Kppen M, Kasabov N, Coghill G (eds) Advances in neuro-information processing. ICONIP 2008. Lecture Notes in Computer Science, vol 5506. Springer, Berlin, pp 539–546
Moghimi M, Varjani AY (2016) New rule-based phishing detection method. Expert Syst Appl 53:231–242
Murphy C, Kaiser GE (2008) Improving the dependability of machine learning applications. In: Research Report, Department of Computer Science, Columbia University, NY, USA
Novakovic J (2016) Toward optimal feature selection using ranking methods and classification algorithms. Yugoslav J Oper Res 21(1)
Obitko M (2017) Introduction to genetic algorithms. In: http://obitko.com/tutorials/genetic-algorithms/. Accessed 15 Oct 2017
Pradeepth KI, Kannan A (2009) Performance study of classification techniques for phishing url detection. In: Kppen M, Kasabov N, Coghill G (eds) Advances in neuro-information processing. ICONIP 2008. Lecture Notes in Computer Science, vol 5506. Springer, Berlin, pp 539–546
Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann, Burlington
Reyes ER, Negny S, Robles GC, Le Lann JM (2015) Improvement of online adaptation knowledge acquisition and reuse in case-based reasoning: application to process engineering design. Eng Appl Artif Intell 41:1–16
Richter MM, Weber R (2013) Case-based reasoning: a textbook. Sringer, Berlin
Scrucca L (2016) Genetic algorithms for subset selection in model-based clustering. Springer, Berlin
Sumner M, Hall M, Frank E (2005) Greedy attribute selection. In: Jorge A, Torgo L, Brazdil P, Camacho R, Gama J (eds) PKDD’05. Lecture notes in computer science, vol 3721. Springer, Berlin, Heidelberg, 675–683, Morgan Kaufmann, Burlington
Tan CL, Chiew KL (2014) Phishing website detection using url-assisted brand name weighting system. In: Proceedings of the international symposium on intelligent signal processing and communication systems (ISPACS’14), pp 054–059
Xue B, Zhang M, Browne WN, Yao X (2016) A survey on evolutionary computation approaches to feature selection. IEEE Trans Evol Comput 20:606–626
Yala N, Fergani B, Fleury A (2017) Towards improving feature extraction and classification for activity recognition on streaming data. J Ambient Intell Hum Comput 8:177–189
Zuhair H, Selamat A, Salleh M (2015) Selection of robust feature subsets for phish webpage prediction using maximum relevance and minimum redundancy criterion. J Theor Appl Inf Technol 81(2):188–205
Acknowledgements
The authors extend their appreciation to the Deanship of Scientific Research at King Saud University for funding this work through research Group No. RG-1439-023.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Abutair, H., Belghith, A. & AlAhmadi, S. CBR-PDS: a case-based reasoning phishing detection system. J Ambient Intell Human Comput 10, 2593–2606 (2019). https://doi.org/10.1007/s12652-018-0736-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-018-0736-0