Abstract
The problem of spam short message (SMS) recognition involves many aspects of natural language processing. A good solution to solving the problem can not only improve the quality of people experiencing the mobile life, but also has a positive role on promoting the analysis of short text occurring in current mobile applications, such as Webchat and microblog. As spam SMSes have characteristics of sparsity, transformation and real-timedness, we propose three methods at different levels, i.e., recognition based on symbolic features, recognition based on text similarity, and recognition based on pattern matching. By combining these methods, we obtain a multi-level approach to spam SMS recognition. In order to enrich the pattern base to reduce manual labor and time, we propose a quasi-pattern learning method, which utilizes quasi-pattern matching results in the pattern matching process. The method can learn many interesting and new patterns from the SMS corpus. Finally, a comprehensive analysis indicates that our spam SMS recognition approach achieves a precision rate as high as 95.18%, and a recall rate of 95.51%.
Similar content being viewed by others
References
Chen Y W. The research of treatment for spam message in China. Dissertation for the Doctoral Degree. Shanghai: Shanghai Jiao Tong University, 2010
Huang L Y. On the countermeasures of junk message. Journal of Chongqing University of Posts and Telecommunications (Social Science Edition), 2010, 3: 25–30
Jia X Z. A study on legal governance of spam messages in China. Dissertation for the Doctoral Degree. Changchun: Jilin University, 2013
Yi Y F. Principles and implementation of spam short message monitoring. Zhongxing Telecom Technology, 2005, 11(6): 49–54
Zhang Y, Fu J M. Identifying and trace backing short message spam. Application Research of Computers, 2006, 23(3): 245–247
Wang B, Pan WF. A survey of content-based anti-spam email filtering. Journal of Chinese Information Processing, 2006, 19(5): 1–10
Shan G Y, Fan X H, Yang Y X. Short message service system security analysis. Information Network Security, 2003, 11: 52–54
Shi J. An effective spam short message filtering system. Dissertation for the Doctoral Degree. Chengdu: University of Electronic Science and Technology of China, 2010
Wang R, Tan W. Management of spam SMS based on big data mining. Telecom Engineering Technics and Standardization, 2015, 2: 78–82
Qian Q, Wan B. Spam messages intercept strategy research based on the generalized digit. China New Communication, 2015, 4: 42–43
Zhang Y J, Liu J L, Gao S B. Spam short message classifier model based on association rules. Journal of Nantong University (Natural Science Edition), 2014, 3: 6–12
Sun D. Application and implementation of Hadoop cloud computing technology in junk message filtering. Netinfo Security, 2015, 7: 13–19
Uysal A K, Gunal S, Ergin S, Gunal E S. A novel framework for SMS spam filtering. In: Proceedings of 2012 International Symposium on Innovations in Intelligent Systems and Applications (INISTA). 2012
Duan L Z, Li N, Huang L J. A new spam short message classification. In: Proceedings of the 1st International Workshop on Education Technology and Computer Science. 2009
Rafique M Z, Farooq M. SMS SPAM detection by operating on bytelevel distributions using hidden markov models. In: Proceedings of the 20th Virus Bulletin International Conference. 2010
Chen K X, Chen J Y. An improved spam short message filtering technology based on the naive Bayesian algorithm. Fujian Computer, 2014, 3: 42–43
Wu N N, Wu M G, Chen S. Real-time monitoring and filtering system for mobile SMS. In: Proceedings of the 3rd IEEE Conference on Industrial Electronics and Applications. 2008
Ma N. Research on content based spam short message identifying. Dissertation for the Doctoral Degree. Beijing: Beijing University of Posts and Telecommunications, 2014
Huang W L. Research on key techniques of spam short message filtering. Dissertation for the Doctoral Degree. Hangzhou: Zhejiang University, 2008
Li Y T. Research on spam short message text classification algorithm. Heilongjiang Science and Technology Information, 2015, 19: 144
Gong C C. Research on short text language computing. Dissertation for the Doctoral Degree. Beijing: The Institute of Computing Technology of the Chinese Academy of Sciences, 2008
Ma X, XuWR, Guo J, Hu R L. SMS-2008: an annotated Chinese short messages corpus. Journal of Chinese Information, 2009, 23(4): 22–26
He X. Design and implementation of junk short message filtering system. Dissertation for the Doctoral Degree. Chengdu: University of Electronic Science and Technology of China, 2009
Li H, Zhang Y, Lu H. Junk SMS filtering based on context. Computer Engineering, 2008, 34(12): 154–156
Author information
Authors and Affiliations
Corresponding author
Additional information
Weimin Wang received his PhD at the Graduate University of the Chinese Academy of Sciences, China in 2008. He is now working as a lecturer at Jiangsu University of Science and Technology, China. His research interests include data mining, natural language processing, information retrieval and ontology engineering.
Dan Zhou is now pursuing his master degree in University of Chinese Academy of Sciences, China. His research interests include machine learning, natural language processing and ontology engineering.
Electronic supplementary material
Rights and permissions
About this article
Cite this article
Wang, W., Zhou, D. A multi-level approach to highly efficient recognition of Chinese spam short messages. Front. Comput. Sci. 12, 135–145 (2018). https://doi.org/10.1007/s11704-016-5415-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11704-016-5415-8