Skip to main content

Ensemble Approach to Classify Spam SMS from Bengali Text

  • Conference paper
  • First Online:
Advances in Computing and Data Sciences (ICACDS 2023)

Abstract

The Short Message Service (SMS) is a popular communication tool, but it has some security weaknesses, such as the influx of spam messages from cyber criminals. While several studies have been conducted on filtering and categorizing spam messages in various languages, including English, limited research has been done on detecting spam in Bengali (endonym Bangla) text. This study aims to fill this gap by classifying Bengali SMS messages as either spam or ham (legitimate messages). To accomplish this, the study used machine learning algorithms, including support vector machine (SVM) with a linear kernel and decision tree (DT), logistic regression (LR), and random forest (RF) with various parameters, as baseline models. Ensemble approaches, such as bagging, boosting, and stacking, were then used to enhance the performance of the models. The results show that the ensemble approach successfully identified spam messages in Bengali text, with XGBoost producing the most favorable outcome. The contribution of this study lies in its focus on Bengali text and the demonstration of the ensemble method’s performance on a small dataset. The tool developed in this study can provide a secure and efficient SMS service to customers by reducing the burden of spam messages and improving the overall user experience. Additionally, the tool can be marketed as a value-added service for customers who are concerned about the security of their personal and financial information. Overall, this study highlights the importance of machine learning algorithms, specifically ensemble methods, in detecting spam messages in Bengali text and provides a valuable contribution to the field of SMS security.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Al Maruf, A., Ziyad, Z.M., Haque, M.M., Khanam, F.: Emotion detection from text and sentiment analysis of Ukraine Russia war using machine learning technique. Int. J. Adv. Comput. Sci. Appl. 13(12) (2022)

    Google Scholar 

  2. Al-Talib, G.A., Hassan, H.S.: A study on analysis of SMS classification using TF-IDF weighting. Int. J. Comput. Netw. Commun. Secur. 1(5), 189–194 (2013)

    Google Scholar 

  3. Alasadi, S.A., Bhaya, W.S.: Review of data preprocessing techniques in data mining. J. Eng. Appl. Sci. 12(16), 4102–4107 (2017)

    Google Scholar 

  4. Alshahrani, A.: Intelligent security schema for SMS spam message based on machine learning algorithms. IJIM 15(16), 53 (2021)

    Google Scholar 

  5. Androulidakis, I., Vlachos, V., Papanikolaou, A.: Fimess: filtering mobile external SMS spam. In: Proceedings of the 6th Balkan Conference in Informatics, pp. 221–227 (2013)

    Google Scholar 

  6. Azmin, S., Dhar, K.: Emotion detection from Bangla text corpus using Naive Bayes classifier. In: 2019 4th International Conference on Electrical Information and Communication Technology (EICT), pp. 1–5. IEEE (2019)

    Google Scholar 

  7. Ballı, S., Karasoy, O.: Development of content-based SMS classification application by using Word2Vec-based feature extraction. IET Softw. 13(4), 295–304 (2019)

    Article  Google Scholar 

  8. Beatrix Cleff, E.: Privacy issues in mobile advertising. Int. Rev. Law Comput. Technol. 21(3), 225–236 (2007)

    Article  Google Scholar 

  9. Chen, L., Yan, Z., Zhang, W., Kantola, R.: Implementation of an SMS spam control system based on trust management. In: 2013 IEEE International Conference on Green Computing and Communications and IEEE Internet of Things and IEEE Cyber, Physical and Social Computing, pp. 887–894. IEEE (2013)

    Google Scholar 

  10. Dadhich, A., Thankachan, B.: Sentiment analysis of amazon product reviews using hybrid rule-based approach. In: Somani, A.K., Mundra, A., Doss, R., Bhattacharya, S. (eds.) Smart Systems: Innovations in Computing. SIST, vol. 235, pp. 173–193. Springer, Singapore (2022). https://doi.org/10.1007/978-981-16-2877-1_17

    Chapter  Google Scholar 

  11. Delany, S.J., Buckley, M., Greene, D.: SMS spam filtering: methods and data. Expert Syst. Appl. 39(10), 9899–9908 (2012)

    Article  Google Scholar 

  12. Duan, L., Li, A., Huang, L.: A new spam short message classification. In: 2009 First International Workshop on Education Technology and Computer Science, vol. 2, pp. 168–171. IEEE (2009)

    Google Scholar 

  13. Gaikwad, D., Thool, R.C.: Intrusion detection system using bagging ensemble method of machine learning. In: 2015 International Conference on Computing Communication Control and Automation, pp. 291–295. IEEE (2015)

    Google Scholar 

  14. González, S., García, S., Del Ser, J., Rokach, L., Herrera, F.: A practical tutorial on bagging and boosting based ensembles for machine learning: Algorithms, software tools, performance study, practical perspectives and opportunities. Inf. Fusion 64, 205–237 (2020)

    Article  Google Scholar 

  15. Hakim, A.A., Erwin, A., Eng, K.I., Galinium, M., Muliady, W.: Automated document classification for news article in Bahasa Indonesia based on term frequency inverse document frequency (TF-IDF) approach. In: 2014 6th International Conference on Information Technology and Electrical Engineering (ICITEE), pp. 1–4. IEEE (2014)

    Google Scholar 

  16. Junaid, M.B., Farooq, M.: Using evolutionary learning classifiers to do MobileSpam (SMS) filtering. In: Proceedings of the 13th Annual Conference on Genetic and Evolutionary Computation, pp. 1795–1802 (2011)

    Google Scholar 

  17. Merugu, S., Reddy, M.C.S., Goyal, E., Piplani, L.: Text message classification using supervised machine learning algorithms. In: Kumar, A., Mozar, S. (eds.) ICCCE 2018. LNEE, vol. 500, pp. 141–150. Springer, Singapore (2019). https://doi.org/10.1007/978-981-13-0212-1_15

    Chapter  Google Scholar 

  18. Pavlyshenko, B.: Using stacking approaches for machine learning models. In: 2018 IEEE Second International Conference on Data Stream Mining & Processing (DSMP), pp. 255–258 (2018). https://doi.org/10.1109/DSMP.2018.8478522

  19. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  20. Shafi’I, M.A., et al.: A review on mobile SMS spam filtering techniques. IEEE Access 5, 15650–15666 (2017)

    Article  Google Scholar 

  21. Theodorus, A., Prasetyo, T.K., Hartono, R., Suhartono, D.: Short message service (SMS) spam filtering using machine learning in Bahasa Indonesia. In: 2021 3rd East Indonesia Conference on Computer and Information Technology (EIConCIT), pp. 199–203. IEEE (2021)

    Google Scholar 

  22. Zhang, L., Ma, J., Wang, Y.: Content based spam text classification: an empirical comparison between English and Chinese. In: 2013 5th International Conference on Intelligent Networking and Collaborative Systems, pp. 69–76. IEEE (2013)

    Google Scholar 

  23. Zhang, Y., et al.: Lies in the air: characterizing fake-base-station spam ecosystem in China. In: Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security, pp. 521–534 (2020)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zeyar Aung .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Al Maruf, A., Al Numan, A., Haque, M.M., Jidney, T.T., Aung, Z. (2023). Ensemble Approach to Classify Spam SMS from Bengali Text. In: Singh, M., Tyagi, V., Gupta, P., Flusser, J., Ören, T. (eds) Advances in Computing and Data Sciences. ICACDS 2023. Communications in Computer and Information Science, vol 1848. Springer, Cham. https://doi.org/10.1007/978-3-031-37940-6_36

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-37940-6_36

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-37939-0

  • Online ISBN: 978-3-031-37940-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics