Skip to main content

A Spam Filtering Method Learning from Web Browsing Behavior

  • Conference paper
Knowledge-Based Intelligent Information and Engineering Systems (KES 2008)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5178))

Abstract

In this paper a spam filtering method is proposed. We focus on user behavior that most email users browse the Web. The method reduces troublesome maintenance of the spam filter, since the filter learns from Web browsing behavior in the background. The method uses Web browsing behavior of each user to learn ham words. Ham words are picked up from browsed Web pages using TF-IDF and stored in the database called ham words list. For each received email, the method extracts keywords from the email, including Web pages of the URLs. If some keywords are in the ham words list, the email is treated as a ham. In our experiments, several spam emails which cannot be detected by a Bayesian filter are detected as spams.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Barracuda Networks, Inc.: Barracuda Networks Releases Annual Spam Report (press release, 2007), http://www.barracudanetworks.com/ns/news_and_events/index.php?nid=232

  2. Goodman, J., Cormack, G.V., Heckerman, D.: Spam and the Ongoing Battle for the Inbox. Communication of ACM 50(2), 24–33 (2007)

    Article  Google Scholar 

  3. Cunningham, P., Nowlan, N., Delany, S.J., Haahr, M.: A Case-Based Approach to Spam Filtering that Can Track Concept Drift. In: Proc. ICCBR 2003 Workshop on Long-Lived CBR Systems (2003)

    Google Scholar 

  4. Graham, P.: A Plan for Spam (2002), http://www.paulgraham.com/spam.html

  5. Kumagai, N., Aritsugi, M.: On Applying an Image Processing Technique to Detecting Spams. In: Proc. 21st International Conference on Data Engineering Workshops (ICDEW 2005), p. 1172 (2005)

    Google Scholar 

  6. Salton, G., Buckley, C.: Term-Weighting Approaches in Automatic Text Retrieval. Information Processing and Management 24(5), 513–523 (1988)

    Article  Google Scholar 

  7. Yahoo! Inc.: Yahoo! Search Web Services, http://developer.yahoo.com/search/

  8. http://ultimania.org/sen/ Sen (in Japanese)

  9. Mozilla: Thunderbird, http://www.mozilla.com/thunderbird/

  10. Budzik, J., Hammond, K.J.: User Interactions with Everyday Applications as Context for Just-in-time Information Access. In: Proc. 5th International Conference on Intelligent User Interfaces, pp. 44–51 (2000)

    Google Scholar 

  11. Androutsopoulos, I., Koutsias, J., Chandrinos, K.V., Spyropoulos, C.D.: An Experimental Comparison of Naive Bayesian and Keyword-based Anti-Spam Filtering with Personal E-mail Messages. In: Proc. 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2000), pp. 160–167 (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Ignac Lovrek Robert J. Howlett Lakhmi C. Jain

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Takashita, T., Itokawa, T., Kitasuka, T., Aritsugi, M. (2008). A Spam Filtering Method Learning from Web Browsing Behavior. In: Lovrek, I., Howlett, R.J., Jain, L.C. (eds) Knowledge-Based Intelligent Information and Engineering Systems. KES 2008. Lecture Notes in Computer Science(), vol 5178. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85565-1_96

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-85565-1_96

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-85564-4

  • Online ISBN: 978-3-540-85565-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics