skip to main content
research-article

Trusting spam reporters: A reporter-based reputation system for email filtering

Published:23 December 2008Publication History
Skip Abstract Section

Abstract

Spam is a growing problem; it interferes with valid email and burdens both email users and service providers. In this work, we propose a reactive spam-filtering system based on reporter reputation for use in conjunction with existing spam-filtering techniques. The system has a trust-maintenance component for users, based on their spam-reporting behavior. The challenge that we consider is that of maintaining a reliable system, not vulnerable to malicious users, that will provide early spam-campaign detection to reduce the costs incurred by users and systems. We report on the utility of a reputation system for spam filtering that makes use of the feedback of trustworthy users. We evaluate our proposed framework, using actual complaint feedback from a large population of users, and validate its spam-filtering performance on a collection of real email traffic over several weeks. To test the broader implication of the system, we create a model of the behavior of malicious reporters, and we simulate the system under various assumptions using a synthetic dataset.

References

  1. Broder, A. 1997. On the resemblance and containment of documents. In Proceedings of Compression and Complexity of Sequences (SEQS). IEEE Computer Society, ACM Press, Los Alamitos, CA, 21--29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Chowdhury, A., Frieder, O., Grossman, D. A., and McCabe, M. C. 2002. Collection statistics for fast duplicate document detection. ACM Trans. Inform. Syst. 20, 2, 171--191. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Cormack, G. and Bratko, A. 2006. Batch and online spam filter comparison. In Proceedings of the Third Conference on Email and Anti-Spam.Google ScholarGoogle Scholar
  4. Dalvi, N., Domingos, P., Mausam, Sanghai, S., and Verma, D. 2004. Adversarial classification. In Proceedings of the Tenth International Conference on Knowledge Discovery and Data Mining. ACM, Press, New York, NY, 99--108. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. DCC. 2006. Dcc reputations. http://www.rhyolite.com/anti-spam/dcc/reputations.html.Google ScholarGoogle Scholar
  6. Dredze, M., Gevaryahu, R., and Elias-Bachrach, A. 2007. Learning fast classifiers for image spam. In Proceedings of the Fourth Conference on Email and Anti-Spam.Google ScholarGoogle Scholar
  7. Drucker, H., Wu, D., and Vapnik, V. 1999. Support vector machines for spam categorization. IEEE Trans. Neur. Netw. 10, 5, 1048--1054. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Fawcett, T. 2003. “In vivo” spam filtering: A challenge problem for data mining. KDD Explorat. 5, 2, 203--231. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. FTC. 2003. The can-spam act: Requirements for commercial emailers. http://www.ftc.gov/bcp/conline/pubs/buspubs/canspam.shtm.Google ScholarGoogle Scholar
  10. Golbeck, J. and Hendler, J. 2004. Reputation network analysis for email filtering. In Proceedings of the First Conference on Email and Anti-Spam.Google ScholarGoogle Scholar
  11. Goodman, J. and Yih, W. 2006. Online discriminative spam filter training. In Proceedings of the Third Conference on Email and Anti-Spam.Google ScholarGoogle Scholar
  12. Hall, R. J. 1999. A countermeasure to duplicate-detecting anti-spam techniques. Tech. rep. 99.9.1. AT&T Labs Research, Florham Park and Middletown, NJ.Google ScholarGoogle Scholar
  13. He, J. and Thiesson, B. 2007. Asymmetric gradient boosting with application to spam filtering. In Proceedings of the Fourth Conference on Email and Anti-Spam.Google ScholarGoogle Scholar
  14. Henzinger, M. 2006. Finding near-duplicate Web pages: A large-scale evaluation of algorithms. In Proceedings of the Twenty-Ninth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, NY, 284--291. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Hovold, J. 2005. Naive Bayes spam filtering using word-position-based attributes. In Proceedings of the Second Conference on Email and Anti-Spam.Google ScholarGoogle Scholar
  16. Jonker, C. and Treur, J. 1999. Formal analysis of models for the dynamics of trust based on experiences. In Proceedings of the of the 9th European Workshop on Modelling Autonomous Agents in a Multi-Agent World (MAAMAW '99). Springer-Verlag, Berlin, Germany, 221--231. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Ko&lstoke;cz, A. and Alspector, J. 2001. SVM-based filtering of e-mail spam with content-specific misclassification costs. In Proceedings of the IEEE ICDM Workshop on Text Mining (TextDM'2001).Google ScholarGoogle Scholar
  18. Ko&lstoke;cz, A., Bond, M., and Sargent, J. 2006. The challenges of service-side personalized spam filtering: Scalability and beyond. In Proceedings of the First International Conference on Scalable Information Systems (INFOSCALE). ACM Press, New Yok, NY, 21. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Ko&lstoke;cz, A., Chowdhury, A., and Alspector, J. 2004. The impact of feature selection on signature-driven spam detection. In Proceedings of the First Conference on Email and Anti -Spam.Google ScholarGoogle Scholar
  20. Ludeman, P. and Libbey, M. 2006. Algorithmically determining store-and-forward MTA relays using domainkeys. In Proceedings of the Third Conference on Email and Anti-Spam.Google ScholarGoogle Scholar
  21. Metsis, V., Androutsopoulos, I., and Paliouras, G. 2006. Spam filtering with naive Bayes—which naive Bayes? In Proceedings of the Third Conference on Email and Anti-Spam.Google ScholarGoogle Scholar
  22. Meyer, T. and Whateley, B. 2004. Spambayes: Effective open-source, Bayesian based, email classification system. In Proceedings of the First Conference on Email and Anti-Spam.Google ScholarGoogle Scholar
  23. Prakash, V. and O'Donnell, A. 2005. Fighting spam with reputation systems. Soc. Comput. 3, 9 (Nov.), 36--41. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Prakash, V. and O'Donnell, A. 2007. A reputation-based approach for efficient filtration of spam. http://www.cloudmark.com/releases/docs/wp_reputation_filtration_10640406.pdf.Google ScholarGoogle Scholar
  25. Prince, M., Dahl, B., Holloway, L., Keller, A., and Langheinrich, E. 2005. Understanding how spammers steal your e-mail address: An analysis of the first six months of data from Project Honey Pot. In Proceedings of the Second Conference on Email and Anti-Spam.Google ScholarGoogle Scholar
  26. Ramchurn, S., Hyunh, T., and Jennings, N. 2004. Trust in multi-agent systems. Knowl. Eng. Rev. 19, 1 (Mar.), 1--25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Resnick, P. and Zeckhauser, R. 2002. Trust among strangers in Internet transactions: Empirical analysis of Ebay's reputation system. Adv. Appl. Microecon. 11, 127--157.Google ScholarGoogle ScholarCross RefCross Ref
  28. Resnick, P., Zeckhauser, R., Friedman, R., and Kuwabara, E. 2000. Reputation systems. Commun. ACM 43, 12, 45--48. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Rios, G. and Zha, H. 2004. Exploring support vector machines and random forests for spam detection. In Proceedings of the First Conference on Email and Anti-Spam.Google ScholarGoogle Scholar
  30. Sahami, M., Dumais, S., Heckerman, D., and Horvitz, E. 1998. A Bayesian approach to filtering junk e-mail. In Proceedings of the AAAI-98 Workshop on Learning for Text Categorization. Madison, WI.Google ScholarGoogle Scholar
  31. Sarmenta, L. 2001. Volunteer computing. Ph.D. dissertation, MIT, Cambridge, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Sculley, D. and Wachman, G. 2007. Relaxed online support vector machines for spam filtering. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, NY, 415--422. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Symantec. 2004. White paper: Filtering technologies in symantec brightmail antispam 6.0. http://www.symantec.com/offer?a_id=19959.Google ScholarGoogle Scholar
  34. Taylor, B. 2006. Sender reputation in a large Webmail service. In Proceedings of the Third Conference on Email and Anti-Spam.Google ScholarGoogle Scholar
  35. Witkowski, M., Artikis, A., and Pitt, J. 2001. Experiments in building experiential trust in a society of objective-trust based agents. In Trust in Cyber-Societies. Lecture Notes in Computer Science, vol. 2246, 22, 6. Springer, Berlin, Germany, 111--132. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Yih, W., Goodman, J., and Hulten, G. 2006. Learning at low false positive rates. In Proceedings of the Third Conference on Email and Anti-Spam.Google ScholarGoogle Scholar
  37. Yoshida, K., Adachi, F., Washio, T., Motoda, H., Homma, T., Nakashima, A., Fujikawa, H., and Yamazaki, K. 2004. Density-based spam detector. In Proceedings of KDD. ACM Press, New York, NJ, 486--493. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Trusting spam reporters: A reporter-based reputation system for email filtering

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Information Systems
        ACM Transactions on Information Systems  Volume 27, Issue 1
        December 2008
        208 pages
        ISSN:1046-8188
        EISSN:1558-2868
        DOI:10.1145/1416950
        Issue’s Table of Contents

        Copyright © 2008 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 23 December 2008
        • Accepted: 1 March 2008
        • Revised: 1 August 2007
        • Received: 1 March 2007
        Published in tois Volume 27, Issue 1

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader