Skip to main content

DL Meets P2P – Distributed Document Retrieval Based on Classification and Content

  • Conference paper
Research and Advanced Technology for Digital Libraries (ECDL 2005)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3652))

Included in the following conference series:

Abstract

Peer-to-peer architectures are a potentially powerful paradigm for retrieving documents over networks of digital libraries avoiding single points of failure by massive federation of (independent) information sources. Today sharing files over P2P infrastructures is already immensely successful, but restricted to simple metadata matching. But when it comes to the retrieval of complex documents, capabilities as provided by digital libraries are needed. Digital libraries have to cope with compound documents. Though some document parts (like embedded images) can efficiently be retrieved using metadata matching, the text-based information needs different methods like full text search. However, for effective querying of texts, also information like inverted document frequencies are essential. But due to the distributed characteristics of P2P networks such ’collection-wide’ information poses severe problems, e.g. that central updates whenever changes in any document collection occur use up valuable bandwidth. We will present a novel indexing technique that allows to query using collection-wide information with respect to different classifications and show the effectiveness of our scheme for practical applications. We will in detail discuss our findings and present simulations for the scheme’s efficiency and scalability.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aberer, K.: P-grid: A self-organizing access structure for p2p information systems. In: In Proceedings of the Sixth International Conference on Cooperative Information Systems (CoopIS), Trento, Italy (2001)

    Google Scholar 

  2. Balke, W.-T., Nejld, W., Siberski, W., Thaden, U.: Progressive distributed top-k retrieval in peer-to-peer networks. In: Proceedings of the 21st International Conference on Data Engineering (ICDE 2005) (2005)

    Google Scholar 

  3. Cuenca-Acuna, F.M., Peery, C., Martin, R.P., Nguyen, T.D.: PlanetP: Using Gossiping to Build Content Addressable Peer-to-Peer Information Sharing Communities. In: Twelfth IEEE International Symposium on High Performance Distributed Computing (HPDC-12), June 2003. IEEE Press, Los Alamitos (2003)

    Google Scholar 

  4. Korfhage, R.: Information Storage and Retrieval. John Wiley, New York (1997)

    Google Scholar 

  5. Li, Y., Bandar, Z.A., McLean, D.: An approach for measuring semantic similarity between words using multiple information sources. IEEE Transactions on Knwoledge and Data Engineering 15(4) (2003)

    Google Scholar 

  6. Lu, J., Callan, J.: Federated search of text-based digital libraries in hierarchical peer-to-peer networks. In: Losada, D.E., Fernández-Luna, J.M. (eds.) ECIR 2005. LNCS, vol. 3408, pp. 52–66. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  7. Nejld, W., Siberski, W., Thaden, U., Balke, W.-T.: Top-k query evaluation for schema-based peer-to-peer networks. In: McIlraith, S.A., Plexousakis, D., van Harmelen, F. (eds.) ISWC 2004. LNCS, vol. 3298, pp. 137–151. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  8. Ratnasamy, S., Francis, P., Handley, M., Karp, R., Shenker, S.: A scalable content addressable network. In: Proceedings of the 2001 Conference on applications, technologies, architectures, and protocols for computer communications. ACM Press, New York (2001)

    Google Scholar 

  9. Schlosser, M., Sintek, M., Decker, S., Nejdl, W.: HyperCuP—Hypercubes, Ontologies and Efficient Search on P2P Networks. In: Moro, G., Koubarakis, M. (eds.) AP2PC 2002. LNCS (LNAI), vol. 2530, pp. 112–124. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  10. Siberski, W., Thaden, U.: A simulation framework for schema-based query routing in p2p-networks. In: 1st International Workshop on Peer-to-Peer Computing & DataBases(P2P& DB 2004) (2004)

    Google Scholar 

  11. Stoica, I., Morris, R., Karger, D., Kaashoek, M.F., Balakrishnan, H.: Chord: A scalable peer-to-peer lookup service for internet applications. In: Proceedings of the 2001 Conference on applications, technologies, architectures, and protocols for computer communications. ACM Press, New York (2001)

    Google Scholar 

  12. Tang, C., Xu, Z., Mahalingam, M.: Peersearch: Efficient information retrieval in peer-peer networks. Technical Report HPL-2002-198, Hewlett-Packard Labs (2002)

    Google Scholar 

  13. Viles, C.L., French, J.C.: Dissemination of collection wide information in a distributed information retrieval system. In: Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 12–20. ACM Press, New York (1995)

    Chapter  Google Scholar 

  14. Viles, C.L., French, J.C.: On the update of term weights in dynamic information retrieval systems. In: Proceedings of the 1995 International Conference on Information and Knowledge Management (CIKM), pp. 167–174. ACM Press, New York (1995)

    Google Scholar 

  15. Wang, C., Li, J., Shi, S.: Cell abstract indices for content-based approximate query processing in structured peer-to-peer data systems. In: Yu, J.X., Lin, X., Lu, H., Zhang, Y. (eds.) APWeb 2004. LNCS, vol. 3007, pp. 269–278. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  16. Witten, I., Moffat, A., Bell, T.: Managing Gigabytes. Morgan Kaufman, Heidelberg (1999)

    Google Scholar 

  17. Yang, B., Garcia-Molina, H.: Designing a super-peer network. In: Proccedings of the 19th International Conference on Data Engineering (ICDE) (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Balke, WT., Nejdl, W., Siberski, W., Thaden, U. (2005). DL Meets P2P – Distributed Document Retrieval Based on Classification and Content. In: Rauber, A., Christodoulakis, S., Tjoa, A.M. (eds) Research and Advanced Technology for Digital Libraries. ECDL 2005. Lecture Notes in Computer Science, vol 3652. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11551362_34

Download citation

  • DOI: https://doi.org/10.1007/11551362_34

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-28767-4

  • Online ISBN: 978-3-540-31931-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics