Abstract
Recent work in P2P overlay networks allow for decentralized object location and routing (DOLR) across networks based on unique IDs. In this paper, we propose an extension to DOLR systems to publish objects using generic feature vectors instead of content-hashed GUIDs, which enables the systems to locate similar objects.We discuss the design of a distributed text similarity engine, named Approximate Text Addressing (ATA), built on top of this extension that locates objects by their text descriptions. We then outline the design and implementation of a motivating application on ATA, a decentralized spam-filtering service. We evaluate this system with 30,000 real spam email messages and 10,000 non-spam messages, and find a spam identification ratio of over 97% with zero false positives.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Broder, A. Z. Some applications of rabin’s fingerprint method. In Sequences II: Methods in Communications, Security, and Computer Science, R. Capocelli, A. D. Santis, and U. Vaccaro, Eds. Springer Verlag, 1993, pp. 143–152.
Dabek, F., Zhao, B.Y., Druschel, P., Kubiatowicz, J., AND Stoica, I. Towards a common API for structured P2P overlays. In Proceedings of IPTPS (Berkeley, CA, February 2003).
Distributed checksum clearinghouse. http://www.rhyolite.com/anti-spam/dcc/.
Harvey, N. J. A., Jones, M. B., Saroiu, S., Theimer, M., AND Wolman, A. Skipnet: A scalable overlay network with practical locality properties. In Proceedings of USITS(Seattle, WA, March 2003), USENIX.
Hildrum, K., Kubiatowicz, J. D., Rao, S., AND Zhao, B.Y. Distributed object location in a dynamic network. In Proceedings of ACM SPAA (Winnipeg, Canada, August 2002).
Li, J., Loo, B. T., Hellerstein, J., Kaashoek, F., Karger, D. R., AND Morris, R. On the feasibility of peer-to-peer web indexing and search. In 2nd International Workshop on Peer-to-Peer Systems (Berkeley, California, 2003).
Manber, U. Finding similar files in a large file system. In Proceedings of Winter USENIX Conference (1994).
Maymounkov, P., AND Mazieres, D. Kademlia: A peer-to-peer information system based on the XOR metric. In Proceedings of 1st International Workshop on Peer-to-Peer Systems (IPTPS) (Cambridge, MA, March 2002).
Mozilla spam filtering. http://www.mozilla.org/mailnews/spam.html.
Ratnasamy, S., Francis, P., Handley, M., Karp, R., AND Schenker, S. A scalable content-addressable network. In Proceedings of SIGCOMM (August 2001).
Rowstron, A., AND Druschel, P. Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systems. In Proceedings of IFIP/ACM Middleware 2001 (November 2001).
Sahami, M., Dumais, S., Heckerman, D., AND Horvitz, E. A bayesian approach to filtering junk email. In AAAI Workshop on Learning for Text Categorization (Madison, Wisconsin, July 1998).
Spamassassin. http://spamassassin.org.
Spamnet. http://www.cloudmark.com.
Stoica, I., Morris, R., Karger, D., Kaashoek, M. F., AND Balakrishnan, H. Chord: A scalable peer-to-peer lookup service for internet applications. In Proceedings of SIGCOMM (August 2001).
Vipul’s razor. http://www.razor.sourceforge.net/.
Witten, I. H., Moffat, A., AND Bell, T. C. Managing Gigabytes: Compressing and Indexing Documents and Images, second ed. Morgan Kaufmann Publishing, 1999.
Zhao, B. Y., Kubiatowicz, J. D., AND Joseph, A. D. Tapestry: An infrastructure for fault-tolerant wide-area location and routing. Tech. Rep. UCB/CSD-01-1141, U.C. Berkeley, April 2001.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 IFIP International Federation for Information Processing
About this paper
Cite this paper
Zhou, F., Zhuang, L., Zhao, B.Y., Huang, L., Joseph, A.D., Kubiatowicz, J. (2003). Approximate Object Location and Spam Filtering on Peer-to-Peer Systems. In: Endler, M., Schmidt, D. (eds) Middleware 2003. Middleware 2003. Lecture Notes in Computer Science, vol 2672. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44892-6_1
Download citation
DOI: https://doi.org/10.1007/3-540-44892-6_1
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40317-3
Online ISBN: 978-3-540-44892-1
eBook Packages: Springer Book Archive