Abstract
Partial-match queries return data items that contain a subset of the query keywords and order the results based on the statistical properties of the matched keywords. They are essential for information retrieval on large document repositories. However, most current peer-to-peer networks for information retrieval are based on distributed hashing and as such cannot support partial-match queries efficiently. In this paper, we describe an efficient and scalable technique to support partial-match queries on peer-to-peer networks. We observe that the combinations of keywords in the queries are only a small subset of all possible combinations of the keywords in the documents. Therefore, we propose a distributed index structure, called a distributed pattern tree (DPTree), to record frequent query patterns, i.e., combinations of keywords, learnt from the query history at each node in the network. Using this index, a query can identify its best matching patterns quickly and data lookup can be done in logarithmic time with respect to the network size. Our simulation studies on the TREC data sets have shown promising results in comparison with other previous approaches.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Zhao, B.Y., Kubiatowicz, J.D., Joseph, A.D.: Tapestry: An Infrastructure for Fault-tolerant Wide-area Location and Routing, Technical Report UCB/CSD-01- 1141, U. C. Berkeley (April 2001)
Rowstron, A., Druschel, P.: Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systems. In: IFIP/ACM International Conference on Distributed Systems Platforms (Middleware), Heidelberg, Germany, pp. 329–350 (November 2001)
Tang, C., Xu, Z., Dwarkadas, S.: Peer-to-Peer Information Retrieval Using Self- Organizing Semantic Overlay Networks. In: ACM SIGCOMM 2003, Karlsruhe, Germany (August 2003)
Tang, C., Dwarkadas, S., Xu, Z.: On Scaling Latent Semantic Indexing for Large Peer-to-Peer Systems. In: Proc. 27th Annual International ACM SIGIR Conference, Sheffield, UK (July 2004)
Cohen, E., Fiat, A., Kaplan, H.: A case for associative peer to peer overlays. ACM SIGCOMM Computer Communication Review 33(1) (January 2003)
Stoica, I., Morris, R., Karger, D., Kaashoek, M.F., Balakrishnan, H.: Chord: A scalable peer-to-peer lookup service for internet applications. In: Proc. ACM SIGCOMM 2001 (August 2001)
Karger, D., Lehman, E., Leighton, F.T., Levine, M., Lewin, D., Panigrahy, R.: Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the World Wide Web. In: Proc. 29th Annual ACM Symposium on Theory of Computing, pp. 654–663 (May 1997)
Cai, H., Wang, J.: Peer-to-peer computing: Foreseer: a novel, locality-aware peer-topeer system architecture for keyword searches. In: Proc. the 5th ACM/IFIP/USENIX international conference on Middleware (October 2004)
Li, J., Loo, B.T., Hellerstein, J.M., Kaashoek, M.F., Karger, D., Morris, R.: On the feasibility of peer-to-peer web indexing and search. In: 2nd International Workshop on Peer-to-Peer Systems, IPTPS (2003)
Lu, J., Callan, J.: Content-based retrieval in hybrid peer-to-peer networks. In: Proc. The 12th international conference on Information and knowledge management (CIKM), pp. 199–206.
Aneiros, M., Estivill-Castro, V., Sun, C.: Social browsing: Group unified histories an instrument for productive unconstrained co-browsing. In: Proc. 2003 International ACM SIGGROUP Conference on Supporting Group Work, (Novomber 2003)
Berry, M.W., Drmac, Z., Jessup, E.R.: Matrices, Vector Spaces, and Information Retrieval, SIAM Review, pp. 335-362 (June 1999)
Li, M., Lee, W.C., Sivasubramaniam, A., Lee, D.L.: A Small World Overlay Network for Semantic Based Search in P2P. In: 2nd Workshop on Semantics in Peer-to-Peer and Grid Computing,
Zaki, M., Parthasarathy, S., Ogihara, M., Li, W.: New algorithms for fast discovery of association rules. In: Proc. the 3rd Int’l Conf. Knowledge Discovery and Data Mining(KDD) (1997)
Gnawali, O.: A keyword-set search system for peer-to-peer networks. Master’s thesis, Massachusetts Institute of Technology (2002)
Onestat.com, Most People Use 2 Word Phrases in Search Engines According to OneStat. com pressbox27.html, available at http://www.onestat.com/html/aboutus
Reynolds, P., Vahdat, A.: Efficient peer-to-peer keyword searching. In: Endler, M., Schmidt, D.C. (eds.) Middleware 2003. LNCS, vol. 2672, Springer, Heidelberg (2003)
Francis, P., Kambayashi, T., Sato, S., Shimizu, S.: Ingrid: A Self-Configuring Information Navigation Infrastructure. In: 4th International World Wide Web Conference, December 11-14 (1995)
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proc. 20th International Conference on Very Large Data Bases (VLDB), pp. 487–499. Morgan Kaufmann, San Francisco (1994)
Ratnasamy, S., Francis, P., Handley, M., Karp, R., Shenker, S.: A scalable contentaddressable network. In: Proc. ACM SIGCOMM (August 2001)
TREC relevance judgments eng.html, http://trec.nist.gov/data/reljudge
Shao, Y., Wang, R.: BuddyNet: History-based P2P search. In: Losada, D.E., Fernández-Luna, J.M. (eds.) ECIR 2005. LNCS, vol. 3408, pp. 23–37. Springer, Heidelberg (2005)
Wu, Z., Meng, W., Yu, C.T., Li, Z.: Towards a Highly-scalable and Effective Metasearch Engine. In: Proc. 10th International World Wide Web Conference (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhao, D.J., Lee, D.L., Luo, Q. (2006). DPTree: A Distributed Pattern Tree Index for Partial-Match Queries in Peer-to-Peer Networks. In: Ioannidis, Y., et al. Advances in Database Technology - EDBT 2006. EDBT 2006. Lecture Notes in Computer Science, vol 3896. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11687238_32
Download citation
DOI: https://doi.org/10.1007/11687238_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-32960-2
Online ISBN: 978-3-540-32961-9
eBook Packages: Computer ScienceComputer Science (R0)