Peer-to-Peer Non-document Content Searching Method Using User Evaluation of Semantic Vectors

Yoji YAMATO
Hiroshi SUNAGA

Publication
IEICE TRANSACTIONS on Communications   Vol.E89-B    No.9    pp.2309-2318
Publication Date: 2006/09/01
Online ISSN: 1745-1345
DOI: 10.1093/ietcom/e89-b.9.2309
Print ISSN: 0916-8516
Type of Manuscript: Special Section PAPER (Special Section on Networking Technologies for Overlay Networks)
Category: 
Keyword: 
peer-to-peer,  content retrieval,  vector space method,  CAN,  semantic vector,  

Full Text: PDF(850.8KB)>>
Buy this Article



Summary: 
With today's advances in peer-to-peer (P2P) techniques, a lot of non-document content has become searchable and usable. In the near future, since a huge amount of content will be distributed over the networks, not only index server searching but also P2P searching will become important because of its scalability and robustness. Typical P2P content searching services have some problems, such as low search precision ratio, significant increase in traffic and inundations of malicious content such as viruses. We propose a P2P content searching method in which a query is effectively forwarded only to peers that have indices of content semantically similar to the wanted content but not forwarded to the same peer repeatedly. It is based on the ideas of content addressable network (CAN) topology and a vector space method where vectors have a variable length. It maps non-document content to a vector space based on users' evaluations and manages the vector space or routes queries using the CAN topology control. The effectiveness of our method is shown by both analytical estimations and simulation experiments. The simulations clarified that our method is effective at improving the precision and recall ratios while reducing the amount of traffic compared with Gnutella flooding, the vector space method in which vector lengths are fixed (similar to the pSearch method), and Chord. In particular, when there was a lot of malicious content, our method exhibited a higher precision ratio than other methods.