User Action Based Adaptive Learning with Weighted Bayesian Classification for Filtering Spam Mail

Kim, Hyun-Jun; Shrestha, Jenu; Kim, Heung-Nam; Jo, Geun-Sik

doi:10.1007/11941439_83

Hyun-Jun Kim^20,21,
Jenu Shrestha²¹,
Heung-Nam Kim²¹ &
…
Geun-Sik Jo²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4304))

Included in the following conference series:

Australasian Joint Conference on Artificial Intelligence

2892 Accesses
4 Citations

Abstract

Nowadays, e-mail is considered one of the most important communication methods, but most users suffer from Spam mail. To solve this problem, there has been much research. The previous research showed comparatively high performance, but for adaptation of real world, it requires several improvements. First, it needs personalized learning for better performance. We cannot make a strict definition of Spam, because the definition of any context depends on each user. Second, the concept drift or interest drift problem, that is, users’ interest or any context’s concept, may change over time. Therefore, many Spam filtering systems are using continuous learning schemes such as adaptive learning or incremental learning. However, these systems require user feedback or rating results manually, and this inconvenience causes slow learning and performance enhancement. In this research, we developed an adaptive learning system based on an automatic weighting environment. For the automatic weight, we categorized 6 user patterns (actions) on the mailing system whose weights are automatically adapted to the learning phase. From the experiment, we will demonstrate the Bayesian classification with an adaptive learning environment. By using suggesting ideas, we will analyze the comparison result with adaptive learning. Finally, from the experiment using real world data sets, we will prove its possibility for tracking the concept and interest drift problems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Korea Telecom. (2004), http://www.kt.co.kr
Sahami, M., Dumais, S., Heckerman, D., Horvitz, E.: A Bayesian Approach to Filtering Junk E-Mail. In: Learning for Text Categorization, Proc. of the AAAI Workshop, Madison Wisconsin. AAAI Technical Report WS-98-05, pp. 55–62 (1998)
Google Scholar
Thomas, G., Peter, A.F.: Weighted Bayesian Classification based on Support Vector Machine. In: Proc. of the 18th International Conference on Machine Learning, pp. 207–209 (2001)
Google Scholar
Sakkis, G., Androutsopoulos, I., Paliouras, G., Karkaletsis, V., Spyropoulos, C.D., Stamatopoulos, P.: A Memory-Based Approach to Anti-Spam Filtering for Mailing Lists. Information Retrieval 6(1), 49–73 (2000)
Article Google Scholar
Androutsopoulos, I., Koutsias, J., Paliouras, G., Karkaletsis, V., Sakkis, G., Spyropoulos, C., Stamatopoulos, P.: Learning to Filter Spam E-mail: A Comparison of a Naïve Bayesian and a Memory-Based Approach. In: 4th PKDD Workshop on Machine Learning and Textual Information Access (2000)
Google Scholar
The Apache SpamAssassin Project, http://Spamassassin.apache.org/
The SpamBayes Project, http://Spambayes.sourceforge.net/
Kim, H.J., Kim, H.N., Jung, J.J., Jo, G.S.: Spam mail Filtering System using Semantic Enrichment. In: Proc. of the 5th International Conference on Web Information Systems Engineering (2004)
Google Scholar
Cunningham, P., Nowlan, N., Delany, S.J., Haahr, M.: A Case-Based Approach to Spam Filtering that Can Track Concept Drift. In: Ashley, K.D., Bridge, D.G. (eds.) ICCBR 2003. LNCS, vol. 2689, Springer, Heidelberg (2003)
Chapter Google Scholar
Kevin, R.G.: Using Latent Semantic Indexing to Filter Spam. In: ACM Symposium on Applied Computing, Data Mining Track (2003)
Google Scholar
Cohen, W.W.: Learning Rules that Classify E-Mail. In: Proc. of the AAAI Spring Symposium on Machine Learning in Information Access (1996)
Google Scholar
Androutsopoulos, I., Koutsias, J., Chandrinos, K.V., Paliouras, G., Spyropoulos, C.D.: An Evaluation of Naive Bayesian Anti-Spam Filtering. In: Lopez de Mantaras, R., Plaza, E. (eds.) ECML 2000. LNCS (LNAI), vol. 1810, pp. 9–17. Springer, Heidelberg (2000)
Google Scholar
Ferreira, J.T.A.S., Denison, D.G.T., Hand, D.J.: Weighted Naïve Bayes modeling for data mining, Technical report, Dept. of mathematics at Imperial College (2001)
Google Scholar
Kim, H.J., Kim, H.N., Jung, J.J., Jo, G.S.: On Enhancing The Performance of Spam mail Filtering System using Semantic Enrichment. In: Webb, G.I., Yu, X. (eds.) AI 2004. LNCS (LNAI), vol. 3339, Springer, Heidelberg (2004)
Google Scholar
Koychev, I., Schwab, I.: Adaption to Drifting User’s Interests. In: Proc. of the ECML200/MLnet Workshop ML in the New Information Age (2000)
Google Scholar
Pádraig, C., Niamh, N., Sarah, J.D., Mads, H.: A Case-Based Approach to Spam Filtering that Can Track Concept Drift. In: Ashley, K.D., Bridge, D.G. (eds.) ICCBR 2003. LNCS, vol. 2689, Springer, Heidelberg (2003)
Google Scholar
Delany, S.J., Cunningham, P., Coyle, L.: An Assessment of Case-Based Reasoning for Spam Filtering. Artificial Intelligence Review Journal 24(3-4), 359–378 (2005)
Article Google Scholar
Mitchell, T., Caruana, R., Freitag, D., McDermott, J., Zabowski, D.: Experience with a Learning Personal Assistant. Communications of the ACM 37(7), 81–91 (1994)
Article Google Scholar
Schlimmer, J., Granger, R.: Incremental Learning from Noisy Data. Machine Learning 1(3), 317–357 (1986)
Google Scholar
Grabtree, I., Soltysiak, S.: Identifying and Tracking Changing Interests. International Journal of Digital Libraries 2, 38–53 (1998)
Article Google Scholar
Koychev, I.: Gradual Forgetting for Adaptation to Concept Drift. In: Proc. of ECAI 2000 Workshop Current Issues in Spatio-Temporal Reasoning, pp. 101–106 (2000)
Google Scholar
Yang, Y., Liu, X.: A Re-examination of Text Categorization Methods. In: Proc. of the ACM SIGIR 1999 Conference (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

Corporate Technology Operations, R&D IT Infra Group, Samsung Electronics, 416, Maetan-3Dong, Yeongtong-Gu, Gyeonggi-Do, Suwon-City, 443-742, Korea
Hyun-Jun Kim
Intelligent E-Commerce Systems Lab., School of Computer Science & Engineering, Inha University, 253 Younghyun-dong, Nam-Gu, Incheon, 402-751, Korea
Hyun-Jun Kim, Jenu Shrestha, Heung-Nam Kim & Geun-Sik Jo

Authors

Hyun-Jun Kim
View author publications
You can also search for this author in PubMed Google Scholar
Jenu Shrestha
View author publications
You can also search for this author in PubMed Google Scholar
Heung-Nam Kim
View author publications
You can also search for this author in PubMed Google Scholar
Geun-Sik Jo
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

DisPRR, National ICT Australia Ltd, QLD, Australia
Abdul Sattar
School of Computing, University of Tasmania, Sandy Bay, 7005, Tasmania, Australia
Byeong-ho Kang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kim, HJ., Shrestha, J., Kim, HN., Jo, GS. (2006). User Action Based Adaptive Learning with Weighted Bayesian Classification for Filtering Spam Mail. In: Sattar, A., Kang, Bh. (eds) AI 2006: Advances in Artificial Intelligence. AI 2006. Lecture Notes in Computer Science(), vol 4304. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11941439_83

Download citation

DOI: https://doi.org/10.1007/11941439_83
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49787-5
Online ISBN: 978-3-540-49788-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics