Abstract
Spam is a growing problem; it interferes with valid email and burdens both email users and service providers. In this work, we propose a reactive spam-filtering system based on reporter reputation for use in conjunction with existing spam-filtering techniques. The system has a trust-maintenance component for users, based on their spam-reporting behavior. The challenge that we consider is that of maintaining a reliable system, not vulnerable to malicious users, that will provide early spam-campaign detection to reduce the costs incurred by users and systems. We report on the utility of a reputation system for spam filtering that makes use of the feedback of trustworthy users. We evaluate our proposed framework, using actual complaint feedback from a large population of users, and validate its spam-filtering performance on a collection of real email traffic over several weeks. To test the broader implication of the system, we create a model of the behavior of malicious reporters, and we simulate the system under various assumptions using a synthetic dataset.
- Broder, A. 1997. On the resemblance and containment of documents. In Proceedings of Compression and Complexity of Sequences (SEQS). IEEE Computer Society, ACM Press, Los Alamitos, CA, 21--29. Google ScholarDigital Library
- Chowdhury, A., Frieder, O., Grossman, D. A., and McCabe, M. C. 2002. Collection statistics for fast duplicate document detection. ACM Trans. Inform. Syst. 20, 2, 171--191. Google ScholarDigital Library
- Cormack, G. and Bratko, A. 2006. Batch and online spam filter comparison. In Proceedings of the Third Conference on Email and Anti-Spam.Google Scholar
- Dalvi, N., Domingos, P., Mausam, Sanghai, S., and Verma, D. 2004. Adversarial classification. In Proceedings of the Tenth International Conference on Knowledge Discovery and Data Mining. ACM, Press, New York, NY, 99--108. Google ScholarDigital Library
- DCC. 2006. Dcc reputations. http://www.rhyolite.com/anti-spam/dcc/reputations.html.Google Scholar
- Dredze, M., Gevaryahu, R., and Elias-Bachrach, A. 2007. Learning fast classifiers for image spam. In Proceedings of the Fourth Conference on Email and Anti-Spam.Google Scholar
- Drucker, H., Wu, D., and Vapnik, V. 1999. Support vector machines for spam categorization. IEEE Trans. Neur. Netw. 10, 5, 1048--1054. Google ScholarDigital Library
- Fawcett, T. 2003. “In vivo” spam filtering: A challenge problem for data mining. KDD Explorat. 5, 2, 203--231. Google ScholarDigital Library
- FTC. 2003. The can-spam act: Requirements for commercial emailers. http://www.ftc.gov/bcp/conline/pubs/buspubs/canspam.shtm.Google Scholar
- Golbeck, J. and Hendler, J. 2004. Reputation network analysis for email filtering. In Proceedings of the First Conference on Email and Anti-Spam.Google Scholar
- Goodman, J. and Yih, W. 2006. Online discriminative spam filter training. In Proceedings of the Third Conference on Email and Anti-Spam.Google Scholar
- Hall, R. J. 1999. A countermeasure to duplicate-detecting anti-spam techniques. Tech. rep. 99.9.1. AT&T Labs Research, Florham Park and Middletown, NJ.Google Scholar
- He, J. and Thiesson, B. 2007. Asymmetric gradient boosting with application to spam filtering. In Proceedings of the Fourth Conference on Email and Anti-Spam.Google Scholar
- Henzinger, M. 2006. Finding near-duplicate Web pages: A large-scale evaluation of algorithms. In Proceedings of the Twenty-Ninth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, NY, 284--291. Google ScholarDigital Library
- Hovold, J. 2005. Naive Bayes spam filtering using word-position-based attributes. In Proceedings of the Second Conference on Email and Anti-Spam.Google Scholar
- Jonker, C. and Treur, J. 1999. Formal analysis of models for the dynamics of trust based on experiences. In Proceedings of the of the 9th European Workshop on Modelling Autonomous Agents in a Multi-Agent World (MAAMAW '99). Springer-Verlag, Berlin, Germany, 221--231. Google ScholarDigital Library
- Ko&lstoke;cz, A. and Alspector, J. 2001. SVM-based filtering of e-mail spam with content-specific misclassification costs. In Proceedings of the IEEE ICDM Workshop on Text Mining (TextDM'2001).Google Scholar
- Ko&lstoke;cz, A., Bond, M., and Sargent, J. 2006. The challenges of service-side personalized spam filtering: Scalability and beyond. In Proceedings of the First International Conference on Scalable Information Systems (INFOSCALE). ACM Press, New Yok, NY, 21. Google ScholarDigital Library
- Ko&lstoke;cz, A., Chowdhury, A., and Alspector, J. 2004. The impact of feature selection on signature-driven spam detection. In Proceedings of the First Conference on Email and Anti -Spam.Google Scholar
- Ludeman, P. and Libbey, M. 2006. Algorithmically determining store-and-forward MTA relays using domainkeys. In Proceedings of the Third Conference on Email and Anti-Spam.Google Scholar
- Metsis, V., Androutsopoulos, I., and Paliouras, G. 2006. Spam filtering with naive Bayes—which naive Bayes? In Proceedings of the Third Conference on Email and Anti-Spam.Google Scholar
- Meyer, T. and Whateley, B. 2004. Spambayes: Effective open-source, Bayesian based, email classification system. In Proceedings of the First Conference on Email and Anti-Spam.Google Scholar
- Prakash, V. and O'Donnell, A. 2005. Fighting spam with reputation systems. Soc. Comput. 3, 9 (Nov.), 36--41. Google ScholarDigital Library
- Prakash, V. and O'Donnell, A. 2007. A reputation-based approach for efficient filtration of spam. http://www.cloudmark.com/releases/docs/wp_reputation_filtration_10640406.pdf.Google Scholar
- Prince, M., Dahl, B., Holloway, L., Keller, A., and Langheinrich, E. 2005. Understanding how spammers steal your e-mail address: An analysis of the first six months of data from Project Honey Pot. In Proceedings of the Second Conference on Email and Anti-Spam.Google Scholar
- Ramchurn, S., Hyunh, T., and Jennings, N. 2004. Trust in multi-agent systems. Knowl. Eng. Rev. 19, 1 (Mar.), 1--25. Google ScholarDigital Library
- Resnick, P. and Zeckhauser, R. 2002. Trust among strangers in Internet transactions: Empirical analysis of Ebay's reputation system. Adv. Appl. Microecon. 11, 127--157.Google ScholarCross Ref
- Resnick, P., Zeckhauser, R., Friedman, R., and Kuwabara, E. 2000. Reputation systems. Commun. ACM 43, 12, 45--48. Google ScholarDigital Library
- Rios, G. and Zha, H. 2004. Exploring support vector machines and random forests for spam detection. In Proceedings of the First Conference on Email and Anti-Spam.Google Scholar
- Sahami, M., Dumais, S., Heckerman, D., and Horvitz, E. 1998. A Bayesian approach to filtering junk e-mail. In Proceedings of the AAAI-98 Workshop on Learning for Text Categorization. Madison, WI.Google Scholar
- Sarmenta, L. 2001. Volunteer computing. Ph.D. dissertation, MIT, Cambridge, MA. Google ScholarDigital Library
- Sculley, D. and Wachman, G. 2007. Relaxed online support vector machines for spam filtering. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, NY, 415--422. Google ScholarDigital Library
- Symantec. 2004. White paper: Filtering technologies in symantec brightmail antispam 6.0. http://www.symantec.com/offer?a_id=19959.Google Scholar
- Taylor, B. 2006. Sender reputation in a large Webmail service. In Proceedings of the Third Conference on Email and Anti-Spam.Google Scholar
- Witkowski, M., Artikis, A., and Pitt, J. 2001. Experiments in building experiential trust in a society of objective-trust based agents. In Trust in Cyber-Societies. Lecture Notes in Computer Science, vol. 2246, 22, 6. Springer, Berlin, Germany, 111--132. Google ScholarDigital Library
- Yih, W., Goodman, J., and Hulten, G. 2006. Learning at low false positive rates. In Proceedings of the Third Conference on Email and Anti-Spam.Google Scholar
- Yoshida, K., Adachi, F., Washio, T., Motoda, H., Homma, T., Nakashima, A., Fujikawa, H., and Yamazaki, K. 2004. Density-based spam detector. In Proceedings of KDD. ACM Press, New York, NJ, 486--493. Google ScholarDigital Library
Index Terms
- Trusting spam reporters: A reporter-based reputation system for email filtering
Recommendations
Fast Effective Botnet Spam Detection
ICCIT '09: Proceedings of the 2009 Fourth International Conference on Computer Sciences and Convergence Information TechnologySpam mails distributed from botnets have been one of the critical problems for the Internet. Spamming is growing at a rapid rate since sending a flood of mails is easy and very cheap. Spam mails waste user time and consume resources e.g., space and ...
Spam Filtering With Dynamically Updated URL Statistics
Many URL-based spam filters rely on "white" and "black" lists to classify email. The authors' proposed URL-based spam filter instead analyzes URL statistics to dynamically calculate the probabilities of whether email with specific URLs are spam or ...
Can We CAN the Email Spam
CTC '13: Proceedings of the 2013 Fourth Cybercrime and Trustworthy Computing WorkshopThe purpose of email spam is to advertise to sell, phishing attacks, DDOS attacks and many more. Many solutions of various kinds such as blacklisting, whitelisting, grey-listing, content filtering have been proposed at the sender and receiver levels. ...
Comments