Abstract
The Short Message Service (SMS) is a popular communication tool, but it has some security weaknesses, such as the influx of spam messages from cyber criminals. While several studies have been conducted on filtering and categorizing spam messages in various languages, including English, limited research has been done on detecting spam in Bengali (endonym Bangla) text. This study aims to fill this gap by classifying Bengali SMS messages as either spam or ham (legitimate messages). To accomplish this, the study used machine learning algorithms, including support vector machine (SVM) with a linear kernel and decision tree (DT), logistic regression (LR), and random forest (RF) with various parameters, as baseline models. Ensemble approaches, such as bagging, boosting, and stacking, were then used to enhance the performance of the models. The results show that the ensemble approach successfully identified spam messages in Bengali text, with XGBoost producing the most favorable outcome. The contribution of this study lies in its focus on Bengali text and the demonstration of the ensemble method’s performance on a small dataset. The tool developed in this study can provide a secure and efficient SMS service to customers by reducing the burden of spam messages and improving the overall user experience. Additionally, the tool can be marketed as a value-added service for customers who are concerned about the security of their personal and financial information. Overall, this study highlights the importance of machine learning algorithms, specifically ensemble methods, in detecting spam messages in Bengali text and provides a valuable contribution to the field of SMS security.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Al Maruf, A., Ziyad, Z.M., Haque, M.M., Khanam, F.: Emotion detection from text and sentiment analysis of Ukraine Russia war using machine learning technique. Int. J. Adv. Comput. Sci. Appl. 13(12) (2022)
Al-Talib, G.A., Hassan, H.S.: A study on analysis of SMS classification using TF-IDF weighting. Int. J. Comput. Netw. Commun. Secur. 1(5), 189–194 (2013)
Alasadi, S.A., Bhaya, W.S.: Review of data preprocessing techniques in data mining. J. Eng. Appl. Sci. 12(16), 4102–4107 (2017)
Alshahrani, A.: Intelligent security schema for SMS spam message based on machine learning algorithms. IJIM 15(16), 53 (2021)
Androulidakis, I., Vlachos, V., Papanikolaou, A.: Fimess: filtering mobile external SMS spam. In: Proceedings of the 6th Balkan Conference in Informatics, pp. 221–227 (2013)
Azmin, S., Dhar, K.: Emotion detection from Bangla text corpus using Naive Bayes classifier. In: 2019 4th International Conference on Electrical Information and Communication Technology (EICT), pp. 1–5. IEEE (2019)
Ballı, S., Karasoy, O.: Development of content-based SMS classification application by using Word2Vec-based feature extraction. IET Softw. 13(4), 295–304 (2019)
Beatrix Cleff, E.: Privacy issues in mobile advertising. Int. Rev. Law Comput. Technol. 21(3), 225–236 (2007)
Chen, L., Yan, Z., Zhang, W., Kantola, R.: Implementation of an SMS spam control system based on trust management. In: 2013 IEEE International Conference on Green Computing and Communications and IEEE Internet of Things and IEEE Cyber, Physical and Social Computing, pp. 887–894. IEEE (2013)
Dadhich, A., Thankachan, B.: Sentiment analysis of amazon product reviews using hybrid rule-based approach. In: Somani, A.K., Mundra, A., Doss, R., Bhattacharya, S. (eds.) Smart Systems: Innovations in Computing. SIST, vol. 235, pp. 173–193. Springer, Singapore (2022). https://doi.org/10.1007/978-981-16-2877-1_17
Delany, S.J., Buckley, M., Greene, D.: SMS spam filtering: methods and data. Expert Syst. Appl. 39(10), 9899–9908 (2012)
Duan, L., Li, A., Huang, L.: A new spam short message classification. In: 2009 First International Workshop on Education Technology and Computer Science, vol. 2, pp. 168–171. IEEE (2009)
Gaikwad, D., Thool, R.C.: Intrusion detection system using bagging ensemble method of machine learning. In: 2015 International Conference on Computing Communication Control and Automation, pp. 291–295. IEEE (2015)
González, S., García, S., Del Ser, J., Rokach, L., Herrera, F.: A practical tutorial on bagging and boosting based ensembles for machine learning: Algorithms, software tools, performance study, practical perspectives and opportunities. Inf. Fusion 64, 205–237 (2020)
Hakim, A.A., Erwin, A., Eng, K.I., Galinium, M., Muliady, W.: Automated document classification for news article in Bahasa Indonesia based on term frequency inverse document frequency (TF-IDF) approach. In: 2014 6th International Conference on Information Technology and Electrical Engineering (ICITEE), pp. 1–4. IEEE (2014)
Junaid, M.B., Farooq, M.: Using evolutionary learning classifiers to do MobileSpam (SMS) filtering. In: Proceedings of the 13th Annual Conference on Genetic and Evolutionary Computation, pp. 1795–1802 (2011)
Merugu, S., Reddy, M.C.S., Goyal, E., Piplani, L.: Text message classification using supervised machine learning algorithms. In: Kumar, A., Mozar, S. (eds.) ICCCE 2018. LNEE, vol. 500, pp. 141–150. Springer, Singapore (2019). https://doi.org/10.1007/978-981-13-0212-1_15
Pavlyshenko, B.: Using stacking approaches for machine learning models. In: 2018 IEEE Second International Conference on Data Stream Mining & Processing (DSMP), pp. 255–258 (2018). https://doi.org/10.1109/DSMP.2018.8478522
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Shafi’I, M.A., et al.: A review on mobile SMS spam filtering techniques. IEEE Access 5, 15650–15666 (2017)
Theodorus, A., Prasetyo, T.K., Hartono, R., Suhartono, D.: Short message service (SMS) spam filtering using machine learning in Bahasa Indonesia. In: 2021 3rd East Indonesia Conference on Computer and Information Technology (EIConCIT), pp. 199–203. IEEE (2021)
Zhang, L., Ma, J., Wang, Y.: Content based spam text classification: an empirical comparison between English and Chinese. In: 2013 5th International Conference on Intelligent Networking and Collaborative Systems, pp. 69–76. IEEE (2013)
Zhang, Y., et al.: Lies in the air: characterizing fake-base-station spam ecosystem in China. In: Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security, pp. 521–534 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Al Maruf, A., Al Numan, A., Haque, M.M., Jidney, T.T., Aung, Z. (2023). Ensemble Approach to Classify Spam SMS from Bengali Text. In: Singh, M., Tyagi, V., Gupta, P., Flusser, J., Ören, T. (eds) Advances in Computing and Data Sciences. ICACDS 2023. Communications in Computer and Information Science, vol 1848. Springer, Cham. https://doi.org/10.1007/978-3-031-37940-6_36
Download citation
DOI: https://doi.org/10.1007/978-3-031-37940-6_36
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-37939-0
Online ISBN: 978-3-031-37940-6
eBook Packages: Computer ScienceComputer Science (R0)