Skip to main content

DPTree: A Distributed Pattern Tree Index for Partial-Match Queries in Peer-to-Peer Networks

  • Conference paper
Advances in Database Technology - EDBT 2006 (EDBT 2006)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3896))

Included in the following conference series:

Abstract

Partial-match queries return data items that contain a subset of the query keywords and order the results based on the statistical properties of the matched keywords. They are essential for information retrieval on large document repositories. However, most current peer-to-peer networks for information retrieval are based on distributed hashing and as such cannot support partial-match queries efficiently. In this paper, we describe an efficient and scalable technique to support partial-match queries on peer-to-peer networks. We observe that the combinations of keywords in the queries are only a small subset of all possible combinations of the keywords in the documents. Therefore, we propose a distributed index structure, called a distributed pattern tree (DPTree), to record frequent query patterns, i.e., combinations of keywords, learnt from the query history at each node in the network. Using this index, a query can identify its best matching patterns quickly and data lookup can be done in logarithmic time with respect to the network size. Our simulation studies on the TREC data sets have shown promising results in comparison with other previous approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Zhao, B.Y., Kubiatowicz, J.D., Joseph, A.D.: Tapestry: An Infrastructure for Fault-tolerant Wide-area Location and Routing, Technical Report UCB/CSD-01- 1141, U. C. Berkeley (April 2001)

    Google Scholar 

  2. Rowstron, A., Druschel, P.: Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systems. In: IFIP/ACM International Conference on Distributed Systems Platforms (Middleware), Heidelberg, Germany, pp. 329–350 (November 2001)

    Google Scholar 

  3. Tang, C., Xu, Z., Dwarkadas, S.: Peer-to-Peer Information Retrieval Using Self- Organizing Semantic Overlay Networks. In: ACM SIGCOMM 2003, Karlsruhe, Germany (August 2003)

    Google Scholar 

  4. Tang, C., Dwarkadas, S., Xu, Z.: On Scaling Latent Semantic Indexing for Large Peer-to-Peer Systems. In: Proc. 27th Annual International ACM SIGIR Conference, Sheffield, UK (July 2004)

    Google Scholar 

  5. Cohen, E., Fiat, A., Kaplan, H.: A case for associative peer to peer overlays. ACM SIGCOMM Computer Communication Review 33(1) (January 2003)

    Google Scholar 

  6. Stoica, I., Morris, R., Karger, D., Kaashoek, M.F., Balakrishnan, H.: Chord: A scalable peer-to-peer lookup service for internet applications. In: Proc. ACM SIGCOMM 2001 (August 2001)

    Google Scholar 

  7. Karger, D., Lehman, E., Leighton, F.T., Levine, M., Lewin, D., Panigrahy, R.: Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the World Wide Web. In: Proc. 29th Annual ACM Symposium on Theory of Computing, pp. 654–663 (May 1997)

    Google Scholar 

  8. Cai, H., Wang, J.: Peer-to-peer computing: Foreseer: a novel, locality-aware peer-topeer system architecture for keyword searches. In: Proc. the 5th ACM/IFIP/USENIX international conference on Middleware (October 2004)

    Google Scholar 

  9. Li, J., Loo, B.T., Hellerstein, J.M., Kaashoek, M.F., Karger, D., Morris, R.: On the feasibility of peer-to-peer web indexing and search. In: 2nd International Workshop on Peer-to-Peer Systems, IPTPS (2003)

    Google Scholar 

  10. Lu, J., Callan, J.: Content-based retrieval in hybrid peer-to-peer networks. In: Proc. The 12th international conference on Information and knowledge management (CIKM), pp. 199–206.

    Google Scholar 

  11. Aneiros, M., Estivill-Castro, V., Sun, C.: Social browsing: Group unified histories an instrument for productive unconstrained co-browsing. In: Proc. 2003 International ACM SIGGROUP Conference on Supporting Group Work, (Novomber 2003)

    Google Scholar 

  12. Berry, M.W., Drmac, Z., Jessup, E.R.: Matrices, Vector Spaces, and Information Retrieval, SIAM Review, pp. 335-362 (June 1999)

    Google Scholar 

  13. Li, M., Lee, W.C., Sivasubramaniam, A., Lee, D.L.: A Small World Overlay Network for Semantic Based Search in P2P. In: 2nd Workshop on Semantics in Peer-to-Peer and Grid Computing,

    Google Scholar 

  14. Zaki, M., Parthasarathy, S., Ogihara, M., Li, W.: New algorithms for fast discovery of association rules. In: Proc. the 3rd Int’l Conf. Knowledge Discovery and Data Mining(KDD) (1997)

    Google Scholar 

  15. Gnawali, O.: A keyword-set search system for peer-to-peer networks. Master’s thesis, Massachusetts Institute of Technology (2002)

    Google Scholar 

  16. Onestat.com, Most People Use 2 Word Phrases in Search Engines According to OneStat. com pressbox27.html, available at http://www.onestat.com/html/aboutus

  17. Reynolds, P., Vahdat, A.: Efficient peer-to-peer keyword searching. In: Endler, M., Schmidt, D.C. (eds.) Middleware 2003. LNCS, vol. 2672, Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  18. Francis, P., Kambayashi, T., Sato, S., Shimizu, S.: Ingrid: A Self-Configuring Information Navigation Infrastructure. In: 4th International World Wide Web Conference, December 11-14 (1995)

    Google Scholar 

  19. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proc. 20th International Conference on Very Large Data Bases (VLDB), pp. 487–499. Morgan Kaufmann, San Francisco (1994)

    Google Scholar 

  20. Ratnasamy, S., Francis, P., Handley, M., Karp, R., Shenker, S.: A scalable contentaddressable network. In: Proc. ACM SIGCOMM (August 2001)

    Google Scholar 

  21. TREC relevance judgments eng.html, http://trec.nist.gov/data/reljudge

  22. Shao, Y., Wang, R.: BuddyNet: History-based P2P search. In: Losada, D.E., Fernández-Luna, J.M. (eds.) ECIR 2005. LNCS, vol. 3408, pp. 23–37. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  23. Wu, Z., Meng, W., Yu, C.T., Li, Z.: Towards a Highly-scalable and Effective Metasearch Engine. In: Proc. 10th International World Wide Web Conference (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhao, D.J., Lee, D.L., Luo, Q. (2006). DPTree: A Distributed Pattern Tree Index for Partial-Match Queries in Peer-to-Peer Networks. In: Ioannidis, Y., et al. Advances in Database Technology - EDBT 2006. EDBT 2006. Lecture Notes in Computer Science, vol 3896. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11687238_32

Download citation

  • DOI: https://doi.org/10.1007/11687238_32

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-32960-2

  • Online ISBN: 978-3-540-32961-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics