skip to main content
10.1145/3508072.3508109acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicfndsConference Proceedingsconference-collections
research-article

Hybrid CNN-GRU Framework with Integrated Pre-trained Language Transformer for SMS Phishing Detection

Authors Info & Claims
Published:13 April 2022Publication History

ABSTRACT

Smartphones are prone to SMS phishing due to the rapid growth in the availability of smart mobile technologies driven by Internet connections. Also, detecting phishing SMS is a challenging task due to the unstructured nature of SMS text data with non-linear complex correlations. In this concern, considering the recent advancements in the domain of cybersecurity, we have proposed a hybrid deep learning framework that extracts robust features from SMS texts followed by an automatic detection of Phishing SMS. Due to combining the potential capability of individual models into one hybrid framework, it has outperformed various other individual machine learning and deep learning models. The proposed Phishing Detection framework is an effective hybrid combination of pretrained transformer model, MPNet (Masked and Permuted Language Modeling), with supervised ConvNets (CNN) and Bi-directional Gated Recurrent Units (GRU). It is intended to successfully detect unstructured short phishing text messages that contain complex patterns.

References

  1. 2011. Text Message Spam Infographic. https://www.tatango.com/blog/text-message-spam-infographic/Google ScholarGoogle Scholar
  2. 2012. UCI Machine Learning Repository: SMS Spam Collection Data Set. https://archive.ics.uci.edu/ml/datasets/sms+spam+collectionGoogle ScholarGoogle Scholar
  3. 2017. Daily SMS Mobile Usage Statistics. https://www.smseagle.eu/2017/03/06/daily-sms-mobile-statistics/Google ScholarGoogle Scholar
  4. 2018. Mobile Phishing Report 2018. Technical Report. https://www.wandera.com/mobile-phishing-report/Google ScholarGoogle Scholar
  5. 2021. Mobile Phishing Increases More Than 300% as 2020 Chaos Continues | Proofpoint US. https://www.proofpoint.com/us/blog/threat-protection/mobile-phishing-increases-more-300-2020-chaos-continuesGoogle ScholarGoogle Scholar
  6. Sahar Bosaeed, Iyad Katib, and Rashid Mehmood. 2020. A Fog-Augmented Machine Learning based SMS Spam Detection and Classification System. In 2020 Fifth International Conference on Fog and Mobile Edge Computing (FMEC). 325–330. https://doi.org/10.1109/FMEC49853.2020.9144833Google ScholarGoogle ScholarCross RefCross Ref
  7. Badr Eddine Boukari, Akshaya Ravi, and Mounira Msahli. 2021. Machine Learning Detection for SMiShing Frauds. In 2021 IEEE 18th Annual Consumer Communications Networking Conference (CCNC). 1–2. https://doi.org/10.1109/CCNC49032.2021.9369640Google ScholarGoogle Scholar
  8. E. Burke-Kennedy, J. Brennan, and C. Taylor. 2020. Bank of Ireland does U-turn after refusal to reimburse ‘smishing’ victims. https://www.irishtimes.com/business/financial-services/bank-of-ireland-does-u-turn-after-refusal-to-reimburse-smishing-victims-1.4326502Google ScholarGoogle Scholar
  9. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. CoRR abs/1810.04805(2018). arXiv:1810.04805http://arxiv.org/abs/1810.04805Google ScholarGoogle Scholar
  10. Abdallah Ghourabi, Mahmood A. Mahmood, and Qusay M. Alzubi. 2020. A Hybrid CNN-LSTM Model for SMS Spam Detection in Arabic and English Messages. Future Internet 12, 9 (2020). https://doi.org/10.3390/fi12090156Google ScholarGoogle Scholar
  11. Diksha Goel and Ankit Kumar Jain. 2017. Smishing-classifier: a novel framework for detection of smishing attack in mobile environment. In International conference on next generation computing technologies. Springer, 502–512.Google ScholarGoogle Scholar
  12. Wael Hassan Gomaa. 2020. The Impact of Deep Learning Techniques on SMS Spam Filtering. International Journal of Advanced Computer Science and Applications 11, 1(2020). https://doi.org/10.14569/IJACSA.2020.0110167Google ScholarGoogle ScholarCross RefCross Ref
  13. Paul A Grassi, James L Fenton, Elaine M Newton, Ray A Perlner, Andrew R Regenscheid, William E Burr, Justin P Richer, Naomi B Lefkovitz, Jamie M Danker, Yee-Yin Choong, 2020. Digital identity guidelines: Authentication and lifecycle management [includes updates as of 03-02-2020]. (2020).Google ScholarGoogle Scholar
  14. Gauri Jain, Manisha Sharma, and Basant Agarwal. 2019. Optimizing semantic LSTM for spam detection. International Journal of Information Technology 11, 2 (01 Jun 2019), 239–250. https://doi.org/10.1007/s41870-018-0157-5Google ScholarGoogle ScholarCross RefCross Ref
  15. Onur Karasoy and Serkan Ballı. 2021. Spam SMS detection for Turkish language with deep text analysis and deep learning methods. https://link.springer.com/article/10.1007/s13369-021-06187-1Google ScholarGoogle Scholar
  16. Sumit Kumar, Arup Kumar Pal, SK Hafizul Islam, and Mohammad Hammoudeh. 2021. Secure and efficient image retrieval through invariant features selection in insecure cloud environments. Neural Computing and Applications(2021), 1–26.Google ScholarGoogle Scholar
  17. Xiaoxu Liu, Haoye Lu, and Amiya Nayak. 2021. A Spam Transformer Model for SMS Spam Detection. IEEE Access 9(2021), 80253–80263. https://doi.org/10.1109/ACCESS.2021.3081479Google ScholarGoogle ScholarCross RefCross Ref
  18. Sandhya Mishra and Devpriya Soni. 2020. Smishing Detector: A security model to detect smishing through SMS content analysis and URL behavior analysis. Future Generation Computer Systems 108 (2020), 803–815. https://doi.org/10.1016/j.future.2020.03.021Google ScholarGoogle ScholarCross RefCross Ref
  19. Next Caller. 2020. Next Caller’s Fraud & COVID-19 Report. Technical Report (Week 2 & 3). https://nextcaller.com/blog/next-caller-covid-19-fraud-report/Google ScholarGoogle Scholar
  20. XiPeng Qiu, TianXiang Sun, YiGe Xu, YunFan Shao, Ning Dai, and XuanJing Huang. 2020. Pre-trained models for natural language processing: A survey. Science China Technological Sciences 63, 10 (01 Oct 2020), 1872–1897. https://doi.org/10.1007/s11431-020-1647-3Google ScholarGoogle ScholarCross RefCross Ref
  21. Sergio Rojas-Galeano. 2021. Using BERT Encoding to Tackle the Mad-lib Attack in SMS Spam Detection. arxiv:2107.06400 [cs.CL]Google ScholarGoogle Scholar
  22. Jibran Saleem and Mohammad Hammoudeh. 2018. Defense methods against social engineering attacks. In Computer and network security essentials. Springer, 603–618.Google ScholarGoogle Scholar
  23. Iqbal H Sarker. 2021. CyberLearning: Effectiveness analysis of machine learning security modeling to detect cyber-anomalies and multi-attacks. Internet of Things 14(2021), 100393.Google ScholarGoogle ScholarCross RefCross Ref
  24. Iqbal H Sarker. 2021. Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions. SN Computer Science 2, 6 (2021), 1–20.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Iqbal H Sarker, Md Hasan Furhad, and Raza Nowrozy. 2021. AI-driven cybersecurity: an overview, security intelligence modeling and research directions. SN Computer Science 2, 3 (2021), 1–18.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, and Tie-Yan Liu. 2020. MPNet: Masked and Permuted Pre-training for Language Understanding. arxiv:2004.09297 [cs.CL]Google ScholarGoogle Scholar
  27. Gunikhan Sonowal. 2020. Detecting Phishing SMS Based on Multiple Correlation Algorithms. SN Computer Science 1, 6 (2020), 1–9.Google ScholarGoogle ScholarCross RefCross Ref
  28. Gunikhan Sonowal and K S Kuppusamy. 2018. SmiDCA: An Anti-Smishing Model with Machine Learning Approach. Comput. J. 61, 8 (04 2018), 1143–1157. https://doi.org/10.1093/comjnl/bxy039 arXiv:https://academic.oup.com/comjnl/article-pdf/61/8/1143/25209236/bxy039.pdfGoogle ScholarGoogle Scholar
  29. Xu Tan. 2020. MPNet combines strengths of masked and permuted language modeling for language understanding. https://www.microsoft.com/en-us/research/blog/mpnet-combines-strengths-of-masked-and-permuted-language-modeling-for-language-understanding/Google ScholarGoogle Scholar
  30. Rubaiath E. Ulfath, Iqbal H. Sarker, Mohammad Jabed Morshed Chowdhury, and Mohammad Hammoudeh. 2022. Detecting Smishing Attacks Using Feature Extraction and Classification Techniques. In Proceedings of the International Conference on Big Data, IoT, and Machine Learning. Springer Singapore, Singapore, 677–689.Google ScholarGoogle ScholarCross RefCross Ref
  31. Feng Wei and Trang Nguyen. 2020. A Lightweight Deep Neural Model for SMS Spam Detection. In 2020 International Symposium on Networks, Computers and Communications (ISNCC). 1–6. https://doi.org/10.1109/ISNCC49221.2020.9297350Google ScholarGoogle ScholarCross RefCross Ref
  32. Tian Xia and Xuemin Chen. 2020. A Discrete Hidden Markov Model for SMS Spam Detection. Applied Sciences 10, 14 (2020). https://doi.org/10.3390/app10145011Google ScholarGoogle Scholar
  33. Shudong Yang, Xueying Yu, and Ying Zhou. 2020. LSTM and GRU Neural Network Performance Comparison Study: Taking Yelp Review Dataset as an Example. In 2020 International Workshop on Electronic Communication and Artificial Intelligence (IWECAI). 98–101. https://doi.org/10.1109/IWECAI50956.2020.00027Google ScholarGoogle ScholarCross RefCross Ref
  34. Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R Salakhutdinov, and Quoc V Le. 2019. XLNet: Generalized Autoregressive Pretraining for Language Understanding. In Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.). Vol. 32. Curran Associates, Inc.https://proceedings.neurips.cc/paper/2019/file/dc6a7e655d7e5840e66733e9ee67cc69-Paper.pdfGoogle ScholarGoogle Scholar

Index Terms

  1. Hybrid CNN-GRU Framework with Integrated Pre-trained Language Transformer for SMS Phishing Detection
              Index terms have been assigned to the content through auto-classification.

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in
              • Published in

                cover image ACM Other conferences
                ICFNDS 2021: The 5th International Conference on Future Networks & Distributed Systems
                December 2021
                847 pages
                ISBN:9781450387347
                DOI:10.1145/3508072

                Copyright © 2021 ACM

                Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 13 April 2022

                Permissions

                Request permissions about this article.

                Request Permissions

                Check for updates

                Qualifiers

                • research-article
                • Research
                • Refereed limited
              • Article Metrics

                • Downloads (Last 12 months)74
                • Downloads (Last 6 weeks)11

                Other Metrics

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader

              HTML Format

              View this article in HTML Format .

              View HTML Format