Abstract
Recently, a statistical filtering based on Bayes theory, so-called Bayesian filtering gain attention when it was described in the paper “A Plan for Spam” by Paul Graham, and has become a popular mechanism to distinguish spam email from legitimate email. Many modern mail programs make use of Bayesian spam filtering techniques. The implementation of the Bayesian filtering corresponding to the email written in English and Japanese has already been developed. On the other hand, few work is conducted on the implementation of the Bayesian spam corresponding to Chinese email. In this paper, firstly, we adopted a statistical filtering called as bsfilter and modified it to filter out Chinese email. When we targeted Chinese emails for experiment, we analyzed the relation between the parameter and the spam judgement accuracy of the filtering, and also considered the optimal parameter values.
Keywords
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Graham, P.: A Plan For Spam (August 2002)
Bsfilter, http://bsfilter.org/
CCERT Data Sets of Chinese Emails, http://www.ccert.edu.cn/spam/sa/datasets.htm
Robinson, G.: A statistical approach to the spam problem. Linux Journal 107 (2003)
Graham, P.: Better bayesian filtering. In: Spam Conference (2003)
Zhang, L., Zhu, J., Yao, T.: An Evaluation of Statistical Spam Filtering Techniques. ACM Transactions on Asian Language Information Processing 3(4), 243–269 (2004)
Maosong, S., Dayang, S., Changning, H.: CSeg Tagl.0: A Practical Word Segmenter and POS Tagger for Chinese Texts, A97-1018, A Digital Archive of Research Papers in Computational Linguistics
Hovold, J.: Naive Bayes Spam Filtering Using Word-Position-Based Attributes. In: Second Conference on Email and Anti-Spam, CEAS 2005 (2005)
Iwanaga, M., Tabata, T., Sakurai, K.: Comparison with Implementations of Bayesian Filtering for Anti-spam. In: SCIS 2004, vol. 2, pp. 1025–1028 (2004) (in Japanese)
Ohfuku, H., Matsuura, K.: Optimization of Bayesian filtering for Anti-spam. In: SCIS 2005, vol. 1, pp. 199–204 (2005) (in Japanese)
Support Vector Machine, http://www.support-vector.net/
Boosting, http://www.boosting.org/
Markov Chain, http://www.taygeta.com/rwalks/node7.html
Nie, J.-Y., Ren, F.: Chinese Information Retrieval: Using Characters or Words? Information Processing and Management 35(4), 443–462 (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, Z., Hori, Y., Sakurai, K. (2006). Application and Evaluation of Bayesian Filter for Chinese Spam. In: Lipmaa, H., Yung, M., Lin, D. (eds) Information Security and Cryptology. Inscrypt 2006. Lecture Notes in Computer Science, vol 4318. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11937807_20
Download citation
DOI: https://doi.org/10.1007/11937807_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49608-3
Online ISBN: 978-3-540-49610-6
eBook Packages: Computer ScienceComputer Science (R0)