Abstract
Peer-to-peer architectures are a potentially powerful paradigm for retrieving documents over networks of digital libraries avoiding single points of failure by massive federation of (independent) information sources. Today sharing files over P2P infrastructures is already immensely successful, but restricted to simple metadata matching. But when it comes to the retrieval of complex documents, capabilities as provided by digital libraries are needed. Digital libraries have to cope with compound documents. Though some document parts (like embedded images) can efficiently be retrieved using metadata matching, the text-based information needs different methods like full text search. However, for effective querying of texts, also information like inverted document frequencies are essential. But due to the distributed characteristics of P2P networks such ’collection-wide’ information poses severe problems, e.g. that central updates whenever changes in any document collection occur use up valuable bandwidth. We will present a novel indexing technique that allows to query using collection-wide information with respect to different classifications and show the effectiveness of our scheme for practical applications. We will in detail discuss our findings and present simulations for the scheme’s efficiency and scalability.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Aberer, K.: P-grid: A self-organizing access structure for p2p information systems. In: In Proceedings of the Sixth International Conference on Cooperative Information Systems (CoopIS), Trento, Italy (2001)
Balke, W.-T., Nejld, W., Siberski, W., Thaden, U.: Progressive distributed top-k retrieval in peer-to-peer networks. In: Proceedings of the 21st International Conference on Data Engineering (ICDE 2005) (2005)
Cuenca-Acuna, F.M., Peery, C., Martin, R.P., Nguyen, T.D.: PlanetP: Using Gossiping to Build Content Addressable Peer-to-Peer Information Sharing Communities. In: Twelfth IEEE International Symposium on High Performance Distributed Computing (HPDC-12), June 2003. IEEE Press, Los Alamitos (2003)
Korfhage, R.: Information Storage and Retrieval. John Wiley, New York (1997)
Li, Y., Bandar, Z.A., McLean, D.: An approach for measuring semantic similarity between words using multiple information sources. IEEE Transactions on Knwoledge and Data Engineering 15(4) (2003)
Lu, J., Callan, J.: Federated search of text-based digital libraries in hierarchical peer-to-peer networks. In: Losada, D.E., Fernández-Luna, J.M. (eds.) ECIR 2005. LNCS, vol. 3408, pp. 52–66. Springer, Heidelberg (2005)
Nejld, W., Siberski, W., Thaden, U., Balke, W.-T.: Top-k query evaluation for schema-based peer-to-peer networks. In: McIlraith, S.A., Plexousakis, D., van Harmelen, F. (eds.) ISWC 2004. LNCS, vol. 3298, pp. 137–151. Springer, Heidelberg (2004)
Ratnasamy, S., Francis, P., Handley, M., Karp, R., Shenker, S.: A scalable content addressable network. In: Proceedings of the 2001 Conference on applications, technologies, architectures, and protocols for computer communications. ACM Press, New York (2001)
Schlosser, M., Sintek, M., Decker, S., Nejdl, W.: HyperCuP—Hypercubes, Ontologies and Efficient Search on P2P Networks. In: Moro, G., Koubarakis, M. (eds.) AP2PC 2002. LNCS (LNAI), vol. 2530, pp. 112–124. Springer, Heidelberg (2003)
Siberski, W., Thaden, U.: A simulation framework for schema-based query routing in p2p-networks. In: 1st International Workshop on Peer-to-Peer Computing & DataBases(P2P& DB 2004) (2004)
Stoica, I., Morris, R., Karger, D., Kaashoek, M.F., Balakrishnan, H.: Chord: A scalable peer-to-peer lookup service for internet applications. In: Proceedings of the 2001 Conference on applications, technologies, architectures, and protocols for computer communications. ACM Press, New York (2001)
Tang, C., Xu, Z., Mahalingam, M.: Peersearch: Efficient information retrieval in peer-peer networks. Technical Report HPL-2002-198, Hewlett-Packard Labs (2002)
Viles, C.L., French, J.C.: Dissemination of collection wide information in a distributed information retrieval system. In: Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 12–20. ACM Press, New York (1995)
Viles, C.L., French, J.C.: On the update of term weights in dynamic information retrieval systems. In: Proceedings of the 1995 International Conference on Information and Knowledge Management (CIKM), pp. 167–174. ACM Press, New York (1995)
Wang, C., Li, J., Shi, S.: Cell abstract indices for content-based approximate query processing in structured peer-to-peer data systems. In: Yu, J.X., Lin, X., Lu, H., Zhang, Y. (eds.) APWeb 2004. LNCS, vol. 3007, pp. 269–278. Springer, Heidelberg (2004)
Witten, I., Moffat, A., Bell, T.: Managing Gigabytes. Morgan Kaufman, Heidelberg (1999)
Yang, B., Garcia-Molina, H.: Designing a super-peer network. In: Proccedings of the 19th International Conference on Data Engineering (ICDE) (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Balke, WT., Nejdl, W., Siberski, W., Thaden, U. (2005). DL Meets P2P – Distributed Document Retrieval Based on Classification and Content. In: Rauber, A., Christodoulakis, S., Tjoa, A.M. (eds) Research and Advanced Technology for Digital Libraries. ECDL 2005. Lecture Notes in Computer Science, vol 3652. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11551362_34
Download citation
DOI: https://doi.org/10.1007/11551362_34
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28767-4
Online ISBN: 978-3-540-31931-3
eBook Packages: Computer ScienceComputer Science (R0)