skip to main content
10.1145/1458469.1458483acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Cost-effective spam detection in p2p file-sharing systems

Published: 30 October 2008 Publication History

Abstract

Spam is highly pervasive in P2P file-sharing systems and is difficult to detect automatically before actually downloading a file due to the insufficient and biased description of a file returned to a client as a query result. To alleviate this problem, we propose probing technique to collect more complete feature information of query results from the network and apply feature-based ranking for automatically detecting spam in P2P query result sets. Furthermore, we examine the tradeoff between the spam detection performance and the network cost. Different ways of probing are explored to reduce the network cost. Experimental results show that the proposed techniques successfully decrease the amount of spam by 9% in the top-200 results and by 92% in the top-20 results with reasonable cost.

References

[1]
S. Shin, J. Jung, H. Balakrishnan. Malware Prevalence in the KaZaA File-Sharing. Network. In Proc. of the Internet Measurement Conference (IMC), ACM 2006.
[2]
N. Christin, A. S. Weigend and J. Chuang. Content Availability, Pollution and Poisoning in Peer-to-Peer File Sharing Networks. In ACM E-Commerce Conference (EC'05), 2005.
[3]
J. Liang, R. Kumar, Y. Xi and K. Ross. Pollution in P2P File Sharing Systems. In Proc. of INFOCOM'05, May 2005.
[4]
R. Hashemi, M. Bahar, K. D. Tift, and H. Nguyen. Spam Detection: A Syntax and Semantic-based Approach. In proc. International Conf. on Information and Knowledge Engineering (IKE'06), Las Vegas, Nevada, June 2006.
[5]
P. A. Chirita, J. Diederich, and W. Nejdl. MailRank: Using ranking for spam detection. In proc. CIKM'05, Bremen, Germany, 2005.
[6]
Qingqing Gan and Torsten Suel. Improving Web Spam Classifiers Using Link Structure. In Third International Workshop on Adversarial Information Retrieval on the Web (AIRWeb'07), Banff, AB, Canada, May 2007.
[7]
A. Ntoulas, M. Najork, M. Manasse, D. Fetterly. Detecting spam web pages through content analysis. In Proc. of WWW'06.
[8]
J. Liang, N. Naoumov, K. Ross. The Index Poisoning Attack in P2P File Sharing Systems. In proc. of INFOCOM, Barcelona, Spain, Apr. 2006
[9]
D. Jia, W. G. Yee, O. Frieder. Spam Characterization and Detection in Peer-to-Peer File-Sharing Systems. In Proc. ACM 17th Conference on Information and Knowledge Management (CIKM'08), Napa Valley, California, Oct. 2008.
[10]
Limewire. www.limewire.org
[11]
D. Dutta, A. Goel, R. Govindan, H. Zhang, The Design of A Distributed Rating Scheme for Peer-to-peer Systems, In Proc. of Workshop on the Economics of Peer-to-Peer Systems, 2003
[12]
Sepandar D. Kamvar, Mario T. Schlosser, and Hector Garcia-Molina. The EigenTrust Algorithm for Reputation Management in P2P Networks. In Proc. of the Twelfth International World Wide Web (WWW) Conference, May, 2003.
[13]
Kevin Walsh, Emin Gun Sirer. Experience with an Object Reputation System for Peer-to-Peer Filesharing. In 3rd Symposium on NSDI, 2006.
[14]
L. T. Nguyen, W. G. Yee, D. Jia, and O. Frieder, A Tool for Information Retrieval Research in Peer-to-Peer File Sharing Systems, In Proc. IEEE ICDE, 2007.
[15]
D. Dumitriu, E. Knightly, A. Kuzmanovic, I. Stoica and W. Zwaenepoel. Denial-of-Service Resilience in Peer-to-Peer File Sharing Systems. In Proc. Of ACM SIGMETRICS'05, Banff, AB, Canada, June 2005.
[16]
Runfang Zhou and Kai Hwang. Gossip-based Reputation Aggregation for Unstructured Peer-to-Peer Networks. 21th IEEE International Parallel & Distributed Processing Symposium (IPDPS'07), Los Angeles, March 26-30, 2007
[17]
Bitzi website. www.Bitzi.com
[18]
Google Duplicate Content Web Site. http://www.google.com/support/webmasters/bin/answer.py?answer=66359. Accessed May 25, 2008.
[19]
M. Nilsson. Id3v2 web site. www.id3.org.
[20]
D. Grossman and O. Frieder. Information Retrieval: Algorithms and Heuristics. Springer, second edition, 2004.
[21]
Steve Webb, J. Caverlee, and C. Pu. Characterizing Web Spam Using Content and HTTP Session Analysis. In Proc. 4th Conf. on Email and Anti-Spam (CEAS), 2007.
[22]
{22 J. Macguire. Hitting P2P Users Where It Hurts, In Wired, Jan. 13, 2003. http://www.wired.com/entertainment/music/news/2003/01/57112
[23]
Googlebombing 'failure.' Official Google Blog. Sept. 16, 2005. http://googleblog.blogspot.com/2005/09/googlebombing-failure.html
[24]
http://wiki.limewire.org/index.php?title=Junk_Filter
[25]
K. Svore, Q. Wu, C. J. C. Burges and A. Raman. Improving Web spam classification using Rank-time features. In Proc. AIRWeb workshop in WWW, 2007
[26]
http://en.wikipedia.org/wiki/Web_scraping#References
[27]
J. Caverlee and L. Liu. Countering Web Spam with Credibility-Based Link Analysis. In Proc. the 26th ACM Symposium on Principles of Distributed Computing (PODC), 2007.
[28]
The Gnutella protocol specification v0.6. http://rfc-gnutella.sourceforge.net.
[29]
D. Jia, W. G. Yee, L. T. Nguyen, O. Frieder. Distributed, Automatic File Description Tuning in P2P File-Sharing Systems. Springer Journal of Peer-to-Peer Networking and Applications, 2008.
[30]
W. G. Yee, L. T. Nguyen, and O. Frieder. Improved Result Ranking in P2P File-Sharing Systems by Probing for Metadata. In Proc. IEEE NCA, 2006.

Cited By

View all
  • (2012)Economic Evaluation of Interactive Audio Media for Securing Internet ServicesGlobal Security, Safety and Sustainability & e-Democracy10.1007/978-3-642-33448-1_7(46-53)Online publication date: 2012
  • (2009)Workshop on large-scale distributed systems for information retrievalACM SIGIR Forum10.1145/1670598.167060643:1(42-48)Online publication date: 25-Jun-2009

Index Terms

  1. Cost-effective spam detection in p2p file-sharing systems

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    LSDS-IR '08: Proceedings of the 2008 ACM workshop on Large-Scale distributed systems for information retrieval
    October 2008
    90 pages
    ISBN:9781605582542
    DOI:10.1145/1458469
    • Program Chairs:
    • Sebastian Michel,
    • Gleb Skobeltsyn,
    • Wai Gen Yee
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 30 October 2008

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. detection
    2. p2p search
    3. spam

    Qualifiers

    • Research-article

    Conference

    CIKM08
    CIKM08: Conference on Information and Knowledge Management
    October 30, 2008
    California, Napa Valley, USA

    Acceptance Rates

    Overall Acceptance Rate 3 of 5 submissions, 60%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2012)Economic Evaluation of Interactive Audio Media for Securing Internet ServicesGlobal Security, Safety and Sustainability & e-Democracy10.1007/978-3-642-33448-1_7(46-53)Online publication date: 2012
    • (2009)Workshop on large-scale distributed systems for information retrievalACM SIGIR Forum10.1145/1670598.167060643:1(42-48)Online publication date: 25-Jun-2009

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media