research-article

Trusting spam reporters: A reporter-based reputation system for email filtering

Authors:
Elena Zheleva

University of Maryland, College Park, College Park, MD

University of Maryland, College Park, College Park, MD
View Profile

,
Aleksander Kolcz

Microsoft Live Labs, Redmond, WA

Microsoft Live Labs, Redmond, WA
View Profile

,
Lise Getoor

University of Maryland, College Park, College Park, MD

University of Maryland, College Park, College Park, MD
View Profile

Authors Info & Claims

ACM Transactions on Information Systems Volume 27 Issue 1Article No.: 3pp 1–27https://doi.org/10.1145/1416950.1416953

Published:23 December 2008Publication History

ACM Transactions on Information Systems

Abstract

Spam is a growing problem; it interferes with valid email and burdens both email users and service providers. In this work, we propose a reactive spam-filtering system based on reporter reputation for use in conjunction with existing spam-filtering techniques. The system has a trust-maintenance component for users, based on their spam-reporting behavior. The challenge that we consider is that of maintaining a reliable system, not vulnerable to malicious users, that will provide early spam-campaign detection to reduce the costs incurred by users and systems. We report on the utility of a reputation system for spam filtering that makes use of the feedback of trustworthy users. We evaluate our proposed framework, using actual complaint feedback from a large population of users, and validate its spam-filtering performance on a collection of real email traffic over several weeks. To test the broader implication of the system, we create a model of the behavior of malicious reporters, and we simulate the system under various assumptions using a synthetic dataset.

References

Broder, A. 1997. On the resemblance and containment of documents. In Proceedings of Compression and Complexity of Sequences (SEQS). IEEE Computer Society, ACM Press, Los Alamitos, CA, 21--29. Google ScholarDigital Library
Chowdhury, A., Frieder, O., Grossman, D. A., and McCabe, M. C. 2002. Collection statistics for fast duplicate document detection. ACM Trans. Inform. Syst. 20, 2, 171--191. Google ScholarDigital Library
Cormack, G. and Bratko, A. 2006. Batch and online spam filter comparison. In Proceedings of the Third Conference on Email and Anti-Spam.Google Scholar
Dalvi, N., Domingos, P., Mausam, Sanghai, S., and Verma, D. 2004. Adversarial classification. In Proceedings of the Tenth International Conference on Knowledge Discovery and Data Mining. ACM, Press, New York, NY, 99--108. Google ScholarDigital Library
DCC. 2006. Dcc reputations. http://www.rhyolite.com/anti-spam/dcc/reputations.html.Google Scholar
Dredze, M., Gevaryahu, R., and Elias-Bachrach, A. 2007. Learning fast classifiers for image spam. In Proceedings of the Fourth Conference on Email and Anti-Spam.Google Scholar
Drucker, H., Wu, D., and Vapnik, V. 1999. Support vector machines for spam categorization. IEEE Trans. Neur. Netw. 10, 5, 1048--1054. Google ScholarDigital Library
Fawcett, T. 2003. “In vivo” spam filtering: A challenge problem for data mining. KDD Explorat. 5, 2, 203--231. Google ScholarDigital Library
FTC. 2003. The can-spam act: Requirements for commercial emailers. http://www.ftc.gov/bcp/conline/pubs/buspubs/canspam.shtm.Google Scholar
Golbeck, J. and Hendler, J. 2004. Reputation network analysis for email filtering. In Proceedings of the First Conference on Email and Anti-Spam.Google Scholar
Goodman, J. and Yih, W. 2006. Online discriminative spam filter training. In Proceedings of the Third Conference on Email and Anti-Spam.Google Scholar
Hall, R. J. 1999. A countermeasure to duplicate-detecting anti-spam techniques. Tech. rep. 99.9.1. AT&T Labs Research, Florham Park and Middletown, NJ.Google Scholar
He, J. and Thiesson, B. 2007. Asymmetric gradient boosting with application to spam filtering. In Proceedings of the Fourth Conference on Email and Anti-Spam.Google Scholar
Henzinger, M. 2006. Finding near-duplicate Web pages: A large-scale evaluation of algorithms. In Proceedings of the Twenty-Ninth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, NY, 284--291. Google ScholarDigital Library
Hovold, J. 2005. Naive Bayes spam filtering using word-position-based attributes. In Proceedings of the Second Conference on Email and Anti-Spam.Google Scholar
Jonker, C. and Treur, J. 1999. Formal analysis of models for the dynamics of trust based on experiences. In Proceedings of the of the 9th European Workshop on Modelling Autonomous Agents in a Multi-Agent World (MAAMAW '99). Springer-Verlag, Berlin, Germany, 221--231. Google ScholarDigital Library
Ko&lstoke;cz, A. and Alspector, J. 2001. SVM-based filtering of e-mail spam with content-specific misclassification costs. In Proceedings of the IEEE ICDM Workshop on Text Mining (TextDM'2001).Google Scholar
Ko&lstoke;cz, A., Bond, M., and Sargent, J. 2006. The challenges of service-side personalized spam filtering: Scalability and beyond. In Proceedings of the First International Conference on Scalable Information Systems (INFOSCALE). ACM Press, New Yok, NY, 21. Google ScholarDigital Library
Ko&lstoke;cz, A., Chowdhury, A., and Alspector, J. 2004. The impact of feature selection on signature-driven spam detection. In Proceedings of the First Conference on Email and Anti -Spam.Google Scholar
Ludeman, P. and Libbey, M. 2006. Algorithmically determining store-and-forward MTA relays using domainkeys. In Proceedings of the Third Conference on Email and Anti-Spam.Google Scholar
Metsis, V., Androutsopoulos, I., and Paliouras, G. 2006. Spam filtering with naive Bayes—which naive Bayes&quest; In Proceedings of the Third Conference on Email and Anti-Spam.Google Scholar
Meyer, T. and Whateley, B. 2004. Spambayes: Effective open-source, Bayesian based, email classification system. In Proceedings of the First Conference on Email and Anti-Spam.Google Scholar
Prakash, V. and O'Donnell, A. 2005. Fighting spam with reputation systems. Soc. Comput. 3, 9 (Nov.), 36--41. Google ScholarDigital Library
Prakash, V. and O'Donnell, A. 2007. A reputation-based approach for efficient filtration of spam. http://www.cloudmark.com/releases/docs/wp_reputation_filtration_10640406.pdf.Google Scholar
Prince, M., Dahl, B., Holloway, L., Keller, A., and Langheinrich, E. 2005. Understanding how spammers steal your e-mail address: An analysis of the first six months of data from Project Honey Pot. In Proceedings of the Second Conference on Email and Anti-Spam.Google Scholar
Ramchurn, S., Hyunh, T., and Jennings, N. 2004. Trust in multi-agent systems. Knowl. Eng. Rev. 19, 1 (Mar.), 1--25. Google ScholarDigital Library
Resnick, P. and Zeckhauser, R. 2002. Trust among strangers in Internet transactions: Empirical analysis of Ebay's reputation system. Adv. Appl. Microecon. 11, 127--157.Google ScholarCross Ref
Resnick, P., Zeckhauser, R., Friedman, R., and Kuwabara, E. 2000. Reputation systems. Commun. ACM 43, 12, 45--48. Google ScholarDigital Library
Rios, G. and Zha, H. 2004. Exploring support vector machines and random forests for spam detection. In Proceedings of the First Conference on Email and Anti-Spam.Google Scholar
Sahami, M., Dumais, S., Heckerman, D., and Horvitz, E. 1998. A Bayesian approach to filtering junk e-mail. In Proceedings of the AAAI-98 Workshop on Learning for Text Categorization. Madison, WI.Google Scholar
Sarmenta, L. 2001. Volunteer computing. Ph.D. dissertation, MIT, Cambridge, MA. Google ScholarDigital Library
Sculley, D. and Wachman, G. 2007. Relaxed online support vector machines for spam filtering. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, NY, 415--422. Google ScholarDigital Library
Symantec. 2004. White paper: Filtering technologies in symantec brightmail antispam 6.0. http://www.symantec.com/offer?a_id=19959.Google Scholar
Taylor, B. 2006. Sender reputation in a large Webmail service. In Proceedings of the Third Conference on Email and Anti-Spam.Google Scholar
Witkowski, M., Artikis, A., and Pitt, J. 2001. Experiments in building experiential trust in a society of objective-trust based agents. In Trust in Cyber-Societies. Lecture Notes in Computer Science, vol. 2246, 22, 6. Springer, Berlin, Germany, 111--132. Google ScholarDigital Library
Yih, W., Goodman, J., and Hulten, G. 2006. Learning at low false positive rates. In Proceedings of the Third Conference on Email and Anti-Spam.Google Scholar
Yoshida, K., Adachi, F., Washio, T., Motoda, H., Homma, T., Nakashima, A., Fujikawa, H., and Yamazaki, K. 2004. Density-based spam detector. In Proceedings of KDD. ACM Press, New York, NJ, 486--493. Google ScholarDigital Library

Index Terms

Trusting spam reporters: A reporter-based reputation system for email filtering
1. Computing methodologies
  1. Modeling and simulation
    1. Simulation theory
      1. Systems theory
2. Mathematics of computing
  1. Information theory

Recommendations

Fast Effective Botnet Spam Detection
ICCIT '09: Proceedings of the 2009 Fourth International Conference on Computer Sciences and Convergence Information Technology

Spam mails distributed from botnets have been one of the critical problems for the Internet. Spamming is growing at a rapid rate since sending a flood of mails is easy and very cheap. Spam mails waste user time and consume resources e.g., space and ...
Read More
Spam Filtering With Dynamically Updated URL Statistics

Many URL-based spam filters rely on "white" and "black" lists to classify email. The authors' proposed URL-based spam filter instead analyzes URL statistics to dynamically calculate the probabilities of whether email with specific URLs are spam or ...
Read More
Can We CAN the Email Spam
CTC '13: Proceedings of the 2013 Fourth Cybercrime and Trustworthy Computing Workshop

The purpose of email spam is to advertise to sell, phishing attacks, DDOS attacks and many more. Many solutions of various kinds such as blacklisting, whitelisting, grey-listing, content filtering have been proposed at the sender and receiver levels. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Information Systems Volume 27, Issue 1
December 2008
208 pages
ISSN:1046-8188
EISSN:1558-2868
DOI:10.1145/1416950
Issue’s Table of Contents

Copyright © 2008 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 December 2008
- Accepted: 1 March 2008
- Revised: 1 August 2007
- Received: 1 March 2007
Published in tois Volume 27, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Spam filtering
reputation systems
trust
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 27
  Total Citations
  View Citations
- 960
  Total Downloads
- Downloads (Last 12 months)8
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Trusting spam reporters: A reporter-based reputation system for email filtering

ACM Transactions on Information Systems

Abstract

References

Cited By

Index Terms

Recommendations

Fast Effective Botnet Spam Detection

Spam Filtering With Dynamically Updated URL Statistics

Can We CAN the Email Spam

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Trusting spam reporters: A reporter-based reputation system for email filtering

ACM Transactions on Information Systems

Abstract

References

Cited By

Index Terms

Recommendations

Fast Effective Botnet Spam Detection

Spam Filtering With Dynamically Updated URL Statistics

Can We CAN the Email Spam

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media