A multi-level approach to highly efficient recognition of Chinese spam short messages

Wang, Weimin; Zhou, Dan

doi:10.1007/s11704-016-5415-8

A multi-level approach to highly efficient recognition of Chinese spam short messages

Research Article
Published: 25 May 2017

Volume 12, pages 135–145, (2018)
Cite this article

Frontiers of Computer Science Aims and scope Submit manuscript

Weimin Wang¹ &
Dan Zhou²

55 Accesses
6 Citations
Explore all metrics

Abstract

The problem of spam short message (SMS) recognition involves many aspects of natural language processing. A good solution to solving the problem can not only improve the quality of people experiencing the mobile life, but also has a positive role on promoting the analysis of short text occurring in current mobile applications, such as Webchat and microblog. As spam SMSes have characteristics of sparsity, transformation and real-timedness, we propose three methods at different levels, i.e., recognition based on symbolic features, recognition based on text similarity, and recognition based on pattern matching. By combining these methods, we obtain a multi-level approach to spam SMS recognition. In order to enrich the pattern base to reduce manual labor and time, we propose a quasi-pattern learning method, which utilizes quasi-pattern matching results in the pattern matching process. The method can learn many interesting and new patterns from the SMS corpus. Finally, a comprehensive analysis indicates that our spam SMS recognition approach achieves a precision rate as high as 95.18%, and a recall rate of 95.51%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Comparative Study of Spam SMS Detection Techniques for English Content Using Supervised Machine Learning Algorithms

A hybrid spam detection method based on unstructured datasets

Article 21 December 2015

A New SMS Spam Detection Method Using Both Content-Based and Non Content-Based Features

References

Chen Y W. The research of treatment for spam message in China. Dissertation for the Doctoral Degree. Shanghai: Shanghai Jiao Tong University, 2010
Google Scholar
Huang L Y. On the countermeasures of junk message. Journal of Chongqing University of Posts and Telecommunications (Social Science Edition), 2010, 3: 25–30
Google Scholar
Jia X Z. A study on legal governance of spam messages in China. Dissertation for the Doctoral Degree. Changchun: Jilin University, 2013
Google Scholar
Yi Y F. Principles and implementation of spam short message monitoring. Zhongxing Telecom Technology, 2005, 11(6): 49–54
Google Scholar
Zhang Y, Fu J M. Identifying and trace backing short message spam. Application Research of Computers, 2006, 23(3): 245–247
Google Scholar
Wang B, Pan WF. A survey of content-based anti-spam email filtering. Journal of Chinese Information Processing, 2006, 19(5): 1–10
Google Scholar
Shan G Y, Fan X H, Yang Y X. Short message service system security analysis. Information Network Security, 2003, 11: 52–54
Google Scholar
Shi J. An effective spam short message filtering system. Dissertation for the Doctoral Degree. Chengdu: University of Electronic Science and Technology of China, 2010
Google Scholar
Wang R, Tan W. Management of spam SMS based on big data mining. Telecom Engineering Technics and Standardization, 2015, 2: 78–82
Google Scholar
Qian Q, Wan B. Spam messages intercept strategy research based on the generalized digit. China New Communication, 2015, 4: 42–43
Google Scholar
Zhang Y J, Liu J L, Gao S B. Spam short message classifier model based on association rules. Journal of Nantong University (Natural Science Edition), 2014, 3: 6–12
Google Scholar
Sun D. Application and implementation of Hadoop cloud computing technology in junk message filtering. Netinfo Security, 2015, 7: 13–19
Google Scholar
Uysal A K, Gunal S, Ergin S, Gunal E S. A novel framework for SMS spam filtering. In: Proceedings of 2012 International Symposium on Innovations in Intelligent Systems and Applications (INISTA). 2012
Book Google Scholar
Duan L Z, Li N, Huang L J. A new spam short message classification. In: Proceedings of the 1st International Workshop on Education Technology and Computer Science. 2009
Google Scholar
Rafique M Z, Farooq M. SMS SPAM detection by operating on bytelevel distributions using hidden markov models. In: Proceedings of the 20th Virus Bulletin International Conference. 2010
Google Scholar
Chen K X, Chen J Y. An improved spam short message filtering technology based on the naive Bayesian algorithm. Fujian Computer, 2014, 3: 42–43
Google Scholar
Wu N N, Wu M G, Chen S. Real-time monitoring and filtering system for mobile SMS. In: Proceedings of the 3rd IEEE Conference on Industrial Electronics and Applications. 2008
Google Scholar
Ma N. Research on content based spam short message identifying. Dissertation for the Doctoral Degree. Beijing: Beijing University of Posts and Telecommunications, 2014
Google Scholar
Huang W L. Research on key techniques of spam short message filtering. Dissertation for the Doctoral Degree. Hangzhou: Zhejiang University, 2008
Google Scholar
Li Y T. Research on spam short message text classification algorithm. Heilongjiang Science and Technology Information, 2015, 19: 144
Google Scholar
Gong C C. Research on short text language computing. Dissertation for the Doctoral Degree. Beijing: The Institute of Computing Technology of the Chinese Academy of Sciences, 2008
Google Scholar
Ma X, XuWR, Guo J, Hu R L. SMS-2008: an annotated Chinese short messages corpus. Journal of Chinese Information, 2009, 23(4): 22–26
Google Scholar
He X. Design and implementation of junk short message filtering system. Dissertation for the Doctoral Degree. Chengdu: University of Electronic Science and Technology of China, 2009
Google Scholar
Li H, Zhang Y, Lu H. Junk SMS filtering based on context. Computer Engineering, 2008, 34(12): 154–156
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science & Engineering, Jiangsu University of Science and Technology, Jiangsu, 212003, China
Weimin Wang
School of Computer and Control Engineering, University of Chinese Academy of Sciences, Beijing, 100190, China
Dan Zhou

Authors

Weimin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Dan Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Weimin Wang.

Additional information

Weimin Wang received his PhD at the Graduate University of the Chinese Academy of Sciences, China in 2008. He is now working as a lecturer at Jiangsu University of Science and Technology, China. His research interests include data mining, natural language processing, information retrieval and ontology engineering.

Dan Zhou is now pursuing his master degree in University of Chinese Academy of Sciences, China. His research interests include machine learning, natural language processing and ontology engineering.

Electronic supplementary material

Supplementary material, approximately 188 KB.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, W., Zhou, D. A multi-level approach to highly efficient recognition of Chinese spam short messages. Front. Comput. Sci. 12, 135–145 (2018). https://doi.org/10.1007/s11704-016-5415-8

Download citation

Received: 06 October 2015
Accepted: 29 January 2016
Published: 25 May 2017
Issue Date: February 2018
DOI: https://doi.org/10.1007/s11704-016-5415-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A multi-level approach to highly efficient recognition of Chinese spam short messages

Abstract

Access this article

Similar content being viewed by others

A Comparative Study of Spam SMS Detection Techniques for English Content Using Supervised Machine Learning Algorithms

A hybrid spam detection method based on unstructured datasets

A New SMS Spam Detection Method Using Both Content-Based and Non Content-Based Features

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material, approximately 188 KB.

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A multi-level approach to highly efficient recognition of Chinese spam short messages

Abstract

Access this article

Similar content being viewed by others

A Comparative Study of Spam SMS Detection Techniques for English Content Using Supervised Machine Learning Algorithms

A hybrid spam detection method based on unstructured datasets

A New SMS Spam Detection Method Using Both Content-Based and Non Content-Based Features

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material, approximately 188 KB.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation