CBR-PDS: a case-based reasoning phishing detection system

Abutair, Hassan; Belghith, Abdelfettah; AlAhmadi, Saad

doi:10.1007/s12652-018-0736-0

CBR-PDS: a case-based reasoning phishing detection system

Original Research
Published: 28 February 2018

Volume 10, pages 2593–2606, (2019)
Cite this article

Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Hassan Abutair¹,
Abdelfettah Belghith¹ &
Saad AlAhmadi¹

608 Accesses
25 Citations
Explore all metrics

Abstract

Phishing attacks have become the preferred vehicle to gather sensitive information as well as to deliver dangerous malware. So far, there is still no phishing detection system that can perfectly detect and progressively self adapt to differentiate between phishing and legitimate websites. This paper proposes the case-based reasoning Phishing detection system (CBR-PDS) that relies on previous cases to detect phishing attacks. CBR-PDS is highly adaptive and dynamic as it can adapt to detect new phishing attacks using rather a small dataset size in contrast to other machine learning techniques. CBR-PDS aims to improve the detection accuracy and the reliability of the results by identifying a set of discriminative features and discarding irrelevant features. CBR-PDS relies on a two stage hybrid procedure using Information gain and Genetic algorithms. The reduction of the data dimensionality amounts to an improved accuracy rate, yet it necessitates a reduced processing time. The CBR-PDS is tested using different scenarios on a various balanced datasets. The obtained performances clearly show the suitability of our proposed hybrid feature selection procedure as well as the efficiency of the proposed CBR-PDS system. The obtained accuracy rates exceed 95%. We also show that the integration of an Online Phishing Threats component into the CBR-PDS system improves further the accuracy rate. Finally, CRB-PDS performances are compared to those of several known competitive classifiers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Phishing Hybrid Feature-Based Classifier by Using Recursive Features Subset Selection and Machine Learning Algorithms

An Efficient Approach for Phishing Detection using Machine Learning

An Empirical Feature Selection Approach for Phishing Websites Prediction with Machine Learning

References

Aamodt A, Plaza E (1994) Case-based reasoning: foundational issues, methodological variations, and system approaches. AI Commun 7:39–59
Google Scholar
Abu-Nimeh S, Nappa D, Wang X, Nair S (2007) A comparison of machine learning techniques for phishing detection. In: Proceedings of the anti-phishing working group, 2nd annual eCrime researchers summit, pp 60–69
Abutair HYA, Belghith A (2017a) A multi-agent case-based reasoning architecture for phishing detection. Elsevier Procedia Comput Sci 110:492–497
Article Google Scholar
Abutair HYA, Belghith A (2017b) Using case-based reasoning for phishing detection. Elsevier Procedia Comput Sci 109:281–288
Article Google Scholar
Aitken S (2017) Aiai cbr shell. In: Artificial intelligence applications institute. http://www.aiai.ed.ac.uk/project/cbr/CBRDistrib/. Accessed 15 Oct 2017
Albitz P, Liu C (2009) DNS and BIND, 5th edn. O’Reilly Media, Newton
MATH Google Scholar
Alhaj TA, Siraj A, Zainal, MM, Elshoush HT, Elhaj F (2017) Feature selection using information gain for improved structural-based alert correlation. PloS one 11:1–18
Google Scholar
Amiri I, Akanbi O, Fazeldehkordi E (2015) A machine-Learning approach to phishing detection and defense. Elsevier. ISBN: 978-0-12-802927-5. https://www.sciencedirect.com/science/book/9780128029275
Basnet R, Basnet R, Mukkamala S, Sung AH (2008) Detection of phishing attacks: a machine learning approach. Industry studies in fuzziness and soft computing, vol 226. Springer, Berlin
Google Scholar
Basnet RB, Doleck T (2015) Towards developing a tool to detect phishing urls: a machine learning approach. In: IEEE international conference on computational intelligence and communication technology (CICT’15), pp 220–223
Basnet RB, Sung AH, Liu Q (2012) Feature selection for improved phishing detection. IEA/AIE 2012. Lecture Notes in Computer Science, vol 7345. Springer, Berlin
Google Scholar
Bergmann R, Kolodner J, Plaza E (2005) Representation in case-based reasoning. Knowl Eng Rev 20:209–213
Article Google Scholar
Breiman L (2001) Random forests. Mach Learn 45:5–32
Article MATH Google Scholar
Chaudhry JA, Chaudhry SA, Rittenhouse RG (2016) Phishing attacks and defenses. Int J Secur Appl 10:247–256
Google Scholar
Cohen WW (1995) Fast effective rule induction. In: Proceedings of the twelfth international conference on machine learning, pp 115–123, Morgan Kaufmann, Burlington
Cortes C, Vapnik V (1995) Support vector networks. Mach Learn 20:273–297
MATH Google Scholar
Dunham K (2009) Mobile malware attacks and defense. Elsevier. ISBN: 978-1-59749-298-0. https://www.sciencedirect.com/science/book/9781597492980
Eiben AE, Smith JE (2010) Introduction to evolutionary computing (Natural Computing Series). Springer, Berlin
MATH Google Scholar
Frank E, Witten IH (1998) Generating accurate rule sets without global optimization. In: Hamilton: computer science, University of Waikato
Ghamisi P, Benediktsson JA (2015) Feature selection based on hybridization of genetic algorithm and particle swarm optimization. IEEE Geosci Rem Sens Lett 12:309–313
Article Google Scholar
Hall M et al (2009) The weka data mining software: an update. ACM SIGKDD Explor Newsl 11:10–18
Article Google Scholar
Hewahi NM, Alashqar EA (2015) Wrapper feature selection based on genetic algorithm for recognizing objects from satellite imagery. J Inf Technol Res 8:1–20
Article Google Scholar
Huang H, Qian L, Wang Y (2012) A svm-based technique to detect phishing urls. Inf Technol J 11:921–925
Article Google Scholar
Jiang S, Chin KS, Wang L, Qu G, Tsui KL (2017) Modified genetic algorithm-based feature selection combined with pre-trained deep neural network for demand forecasting in outpatient department. Expert Syst Appl 82:216–230
Article Google Scholar
Khonji M, Iraqi Y, Jones A (2013) Phishing detection: a literature survey. IEEE Commun Surv Tutor 15:2091–2121
Article Google Scholar
Landwehr N, Hall M, Frank E (2005) Logistic model trees. Mach Learn 59:161–205
Article MATH Google Scholar
Liu G, Qiu B, Wenyin L (2010) Automatic detection of phishing target from phishing webpage. In: Proceedings of the 20th international conference on pattern recognition (ICPR’10), pp 4153–4156
Liu Z, Wang R, Tao M (2016) Smoteadanl: a learning method for network traffic classification. J Ambient Intell Hum Comput 7:121–130
Article Google Scholar
Marchal S (2015) Analyse du dns et analyse smantique pour la dtection de l’hameonnage. Ph.D. Dissertation, pp 1–5, University of Lorraine, France
Marchal S, Franois J, State R, Engel T (2014a) Phishscore: Hacking phishers’ minds. In: Proceedings of the international conference on network and service management (CNSM’14), pp 46–54
Marchal S, Franois J, State R, EngelMoghimi T (2014b) Phishstorm: detecting phishing with streaming analytics. IEEE Trans Netw Serv Manag 11:458–471
Article Google Scholar
Miyamoto D, Hazeyama H, Kadobayashi Y (2009) An evaluation of machine learning-based methods for detection of phishing sites. In: Kppen M, Kasabov N, Coghill G (eds) Advances in neuro-information processing. ICONIP 2008. Lecture Notes in Computer Science, vol 5506. Springer, Berlin, pp 539–546
Google Scholar
Moghimi M, Varjani AY (2016) New rule-based phishing detection method. Expert Syst Appl 53:231–242
Article Google Scholar
Murphy C, Kaiser GE (2008) Improving the dependability of machine learning applications. In: Research Report, Department of Computer Science, Columbia University, NY, USA
Novakovic J (2016) Toward optimal feature selection using ranking methods and classification algorithms. Yugoslav J Oper Res 21(1)
Google Scholar
Obitko M (2017) Introduction to genetic algorithms. In: http://obitko.com/tutorials/genetic-algorithms/. Accessed 15 Oct 2017
Pradeepth KI, Kannan A (2009) Performance study of classification techniques for phishing url detection. In: Kppen M, Kasabov N, Coghill G (eds) Advances in neuro-information processing. ICONIP 2008. Lecture Notes in Computer Science, vol 5506. Springer, Berlin, pp 539–546
Google Scholar
Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann, Burlington
Google Scholar
Reyes ER, Negny S, Robles GC, Le Lann JM (2015) Improvement of online adaptation knowledge acquisition and reuse in case-based reasoning: application to process engineering design. Eng Appl Artif Intell 41:1–16
Article Google Scholar
Richter MM, Weber R (2013) Case-based reasoning: a textbook. Sringer, Berlin
Book Google Scholar
Scrucca L (2016) Genetic algorithms for subset selection in model-based clustering. Springer, Berlin
Book Google Scholar
Sumner M, Hall M, Frank E (2005) Greedy attribute selection. In: Jorge A, Torgo L, Brazdil P, Camacho R, Gama J (eds) PKDD’05. Lecture notes in computer science, vol 3721. Springer, Berlin, Heidelberg, 675–683, Morgan Kaufmann, Burlington
Tan CL, Chiew KL (2014) Phishing website detection using url-assisted brand name weighting system. In: Proceedings of the international symposium on intelligent signal processing and communication systems (ISPACS’14), pp 054–059
Xue B, Zhang M, Browne WN, Yao X (2016) A survey on evolutionary computation approaches to feature selection. IEEE Trans Evol Comput 20:606–626
Article Google Scholar
Yala N, Fergani B, Fleury A (2017) Towards improving feature extraction and classification for activity recognition on streaming data. J Ambient Intell Hum Comput 8:177–189
Article Google Scholar
Zuhair H, Selamat A, Salleh M (2015) Selection of robust feature subsets for phish webpage prediction using maximum relevance and minimum redundancy criterion. J Theor Appl Inf Technol 81(2):188–205
Google Scholar

Download references

Acknowledgements

The authors extend their appreciation to the Deanship of Scientific Research at King Saud University for funding this work through research Group No. RG-1439-023.

Author information

Authors and Affiliations

College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia
Hassan Abutair, Abdelfettah Belghith & Saad AlAhmadi

Authors

Hassan Abutair
View author publications
You can also search for this author in PubMed Google Scholar
Abdelfettah Belghith
View author publications
You can also search for this author in PubMed Google Scholar
Saad AlAhmadi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Abdelfettah Belghith.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Abutair, H., Belghith, A. & AlAhmadi, S. CBR-PDS: a case-based reasoning phishing detection system. J Ambient Intell Human Comput 10, 2593–2606 (2019). https://doi.org/10.1007/s12652-018-0736-0

Download citation

Received: 05 September 2017
Accepted: 12 December 2017
Published: 28 February 2018
Issue Date: 01 July 2019
DOI: https://doi.org/10.1007/s12652-018-0736-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

CBR-PDS: a case-based reasoning phishing detection system

Abstract

Access this article

Similar content being viewed by others

Phishing Hybrid Feature-Based Classifier by Using Recursive Features Subset Selection and Machine Learning Algorithms

An Efficient Approach for Phishing Detection using Machine Learning

An Empirical Feature Selection Approach for Phishing Websites Prediction with Machine Learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

CBR-PDS: a case-based reasoning phishing detection system

Abstract

Access this article

Similar content being viewed by others

Phishing Hybrid Feature-Based Classifier by Using Recursive Features Subset Selection and Machine Learning Algorithms

An Efficient Approach for Phishing Detection using Machine Learning

An Empirical Feature Selection Approach for Phishing Websites Prediction with Machine Learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation