ABSTRACT
Smartphones are prone to SMS phishing due to the rapid growth in the availability of smart mobile technologies driven by Internet connections. Also, detecting phishing SMS is a challenging task due to the unstructured nature of SMS text data with non-linear complex correlations. In this concern, considering the recent advancements in the domain of cybersecurity, we have proposed a hybrid deep learning framework that extracts robust features from SMS texts followed by an automatic detection of Phishing SMS. Due to combining the potential capability of individual models into one hybrid framework, it has outperformed various other individual machine learning and deep learning models. The proposed Phishing Detection framework is an effective hybrid combination of pretrained transformer model, MPNet (Masked and Permuted Language Modeling), with supervised ConvNets (CNN) and Bi-directional Gated Recurrent Units (GRU). It is intended to successfully detect unstructured short phishing text messages that contain complex patterns.
- 2011. Text Message Spam Infographic. https://www.tatango.com/blog/text-message-spam-infographic/Google Scholar
- 2012. UCI Machine Learning Repository: SMS Spam Collection Data Set. https://archive.ics.uci.edu/ml/datasets/sms+spam+collectionGoogle Scholar
- 2017. Daily SMS Mobile Usage Statistics. https://www.smseagle.eu/2017/03/06/daily-sms-mobile-statistics/Google Scholar
- 2018. Mobile Phishing Report 2018. Technical Report. https://www.wandera.com/mobile-phishing-report/Google Scholar
- 2021. Mobile Phishing Increases More Than 300% as 2020 Chaos Continues | Proofpoint US. https://www.proofpoint.com/us/blog/threat-protection/mobile-phishing-increases-more-300-2020-chaos-continuesGoogle Scholar
- Sahar Bosaeed, Iyad Katib, and Rashid Mehmood. 2020. A Fog-Augmented Machine Learning based SMS Spam Detection and Classification System. In 2020 Fifth International Conference on Fog and Mobile Edge Computing (FMEC). 325–330. https://doi.org/10.1109/FMEC49853.2020.9144833Google ScholarCross Ref
- Badr Eddine Boukari, Akshaya Ravi, and Mounira Msahli. 2021. Machine Learning Detection for SMiShing Frauds. In 2021 IEEE 18th Annual Consumer Communications Networking Conference (CCNC). 1–2. https://doi.org/10.1109/CCNC49032.2021.9369640Google Scholar
- E. Burke-Kennedy, J. Brennan, and C. Taylor. 2020. Bank of Ireland does U-turn after refusal to reimburse ‘smishing’ victims. https://www.irishtimes.com/business/financial-services/bank-of-ireland-does-u-turn-after-refusal-to-reimburse-smishing-victims-1.4326502Google Scholar
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. CoRR abs/1810.04805(2018). arXiv:1810.04805http://arxiv.org/abs/1810.04805Google Scholar
- Abdallah Ghourabi, Mahmood A. Mahmood, and Qusay M. Alzubi. 2020. A Hybrid CNN-LSTM Model for SMS Spam Detection in Arabic and English Messages. Future Internet 12, 9 (2020). https://doi.org/10.3390/fi12090156Google Scholar
- Diksha Goel and Ankit Kumar Jain. 2017. Smishing-classifier: a novel framework for detection of smishing attack in mobile environment. In International conference on next generation computing technologies. Springer, 502–512.Google Scholar
- Wael Hassan Gomaa. 2020. The Impact of Deep Learning Techniques on SMS Spam Filtering. International Journal of Advanced Computer Science and Applications 11, 1(2020). https://doi.org/10.14569/IJACSA.2020.0110167Google ScholarCross Ref
- Paul A Grassi, James L Fenton, Elaine M Newton, Ray A Perlner, Andrew R Regenscheid, William E Burr, Justin P Richer, Naomi B Lefkovitz, Jamie M Danker, Yee-Yin Choong, 2020. Digital identity guidelines: Authentication and lifecycle management [includes updates as of 03-02-2020]. (2020).Google Scholar
- Gauri Jain, Manisha Sharma, and Basant Agarwal. 2019. Optimizing semantic LSTM for spam detection. International Journal of Information Technology 11, 2 (01 Jun 2019), 239–250. https://doi.org/10.1007/s41870-018-0157-5Google ScholarCross Ref
- Onur Karasoy and Serkan Ballı. 2021. Spam SMS detection for Turkish language with deep text analysis and deep learning methods. https://link.springer.com/article/10.1007/s13369-021-06187-1Google Scholar
- Sumit Kumar, Arup Kumar Pal, SK Hafizul Islam, and Mohammad Hammoudeh. 2021. Secure and efficient image retrieval through invariant features selection in insecure cloud environments. Neural Computing and Applications(2021), 1–26.Google Scholar
- Xiaoxu Liu, Haoye Lu, and Amiya Nayak. 2021. A Spam Transformer Model for SMS Spam Detection. IEEE Access 9(2021), 80253–80263. https://doi.org/10.1109/ACCESS.2021.3081479Google ScholarCross Ref
- Sandhya Mishra and Devpriya Soni. 2020. Smishing Detector: A security model to detect smishing through SMS content analysis and URL behavior analysis. Future Generation Computer Systems 108 (2020), 803–815. https://doi.org/10.1016/j.future.2020.03.021Google ScholarCross Ref
- Next Caller. 2020. Next Caller’s Fraud & COVID-19 Report. Technical Report (Week 2 & 3). https://nextcaller.com/blog/next-caller-covid-19-fraud-report/Google Scholar
- XiPeng Qiu, TianXiang Sun, YiGe Xu, YunFan Shao, Ning Dai, and XuanJing Huang. 2020. Pre-trained models for natural language processing: A survey. Science China Technological Sciences 63, 10 (01 Oct 2020), 1872–1897. https://doi.org/10.1007/s11431-020-1647-3Google ScholarCross Ref
- Sergio Rojas-Galeano. 2021. Using BERT Encoding to Tackle the Mad-lib Attack in SMS Spam Detection. arxiv:2107.06400 [cs.CL]Google Scholar
- Jibran Saleem and Mohammad Hammoudeh. 2018. Defense methods against social engineering attacks. In Computer and network security essentials. Springer, 603–618.Google Scholar
- Iqbal H Sarker. 2021. CyberLearning: Effectiveness analysis of machine learning security modeling to detect cyber-anomalies and multi-attacks. Internet of Things 14(2021), 100393.Google ScholarCross Ref
- Iqbal H Sarker. 2021. Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions. SN Computer Science 2, 6 (2021), 1–20.Google ScholarDigital Library
- Iqbal H Sarker, Md Hasan Furhad, and Raza Nowrozy. 2021. AI-driven cybersecurity: an overview, security intelligence modeling and research directions. SN Computer Science 2, 3 (2021), 1–18.Google ScholarDigital Library
- Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, and Tie-Yan Liu. 2020. MPNet: Masked and Permuted Pre-training for Language Understanding. arxiv:2004.09297 [cs.CL]Google Scholar
- Gunikhan Sonowal. 2020. Detecting Phishing SMS Based on Multiple Correlation Algorithms. SN Computer Science 1, 6 (2020), 1–9.Google ScholarCross Ref
- Gunikhan Sonowal and K S Kuppusamy. 2018. SmiDCA: An Anti-Smishing Model with Machine Learning Approach. Comput. J. 61, 8 (04 2018), 1143–1157. https://doi.org/10.1093/comjnl/bxy039 arXiv:https://academic.oup.com/comjnl/article-pdf/61/8/1143/25209236/bxy039.pdfGoogle Scholar
- Xu Tan. 2020. MPNet combines strengths of masked and permuted language modeling for language understanding. https://www.microsoft.com/en-us/research/blog/mpnet-combines-strengths-of-masked-and-permuted-language-modeling-for-language-understanding/Google Scholar
- Rubaiath E. Ulfath, Iqbal H. Sarker, Mohammad Jabed Morshed Chowdhury, and Mohammad Hammoudeh. 2022. Detecting Smishing Attacks Using Feature Extraction and Classification Techniques. In Proceedings of the International Conference on Big Data, IoT, and Machine Learning. Springer Singapore, Singapore, 677–689.Google ScholarCross Ref
- Feng Wei and Trang Nguyen. 2020. A Lightweight Deep Neural Model for SMS Spam Detection. In 2020 International Symposium on Networks, Computers and Communications (ISNCC). 1–6. https://doi.org/10.1109/ISNCC49221.2020.9297350Google ScholarCross Ref
- Tian Xia and Xuemin Chen. 2020. A Discrete Hidden Markov Model for SMS Spam Detection. Applied Sciences 10, 14 (2020). https://doi.org/10.3390/app10145011Google Scholar
- Shudong Yang, Xueying Yu, and Ying Zhou. 2020. LSTM and GRU Neural Network Performance Comparison Study: Taking Yelp Review Dataset as an Example. In 2020 International Workshop on Electronic Communication and Artificial Intelligence (IWECAI). 98–101. https://doi.org/10.1109/IWECAI50956.2020.00027Google ScholarCross Ref
- Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R Salakhutdinov, and Quoc V Le. 2019. XLNet: Generalized Autoregressive Pretraining for Language Understanding. In Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.). Vol. 32. Curran Associates, Inc.https://proceedings.neurips.cc/paper/2019/file/dc6a7e655d7e5840e66733e9ee67cc69-Paper.pdfGoogle Scholar
Index Terms
- Hybrid CNN-GRU Framework with Integrated Pre-trained Language Transformer for SMS Phishing Detection
Recommendations
Applications of deep learning for phishing detection: a systematic literature review
AbstractPhishing attacks aim to steal confidential information using sophisticated methods, techniques, and tools such as phishing through content injection, social engineering, online social networks, and mobile applications. To avoid and mitigate the ...
DSmishSMS-A System to Detect Smishing SMS
AbstractWith the origin of smart homes, smart cities, and smart everything, smart phones came up as an area of magnificent growth and development. These devices became a part of daily activities of human life. This impact and growth have made these ...
Applying machine learning and natural language processing to detect phishing email
AbstractThe growth of online services has been accompanied by increased growth in cyber-attacks. One of the most common effective attacks is phishing, in which attempts are made to steal confidential information by impersonating a legitimate ...
Comments