Skip to main content

Achieving High Precisions with Peer-to-Peer Is Possible!

  • Conference paper
  • 543 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6203))

Abstract

Until previously, centralized stand-alone solutions had no problem coping with the load of storing, indexing and searching the small test collections used for evaluating search results at INEX. However, searching the new large-scale Wikipedia collection of 2009 requires much more resources such as processing power, RAM, and index space. It is hence more important than ever to regard efficiency issues when performing XML-Retrieval tasks on such a big collection. On the other hand, the rich markup of the new collection is an opportunity to exploit the given structure and obtain a more efficient search. This paper describes our experiments using distributed search techniques based on XML-Retrieval. Our aim is to improve both effectiveness and efficiency; we have thus submitted search results to both the Efficiency Track and the Ad Hoc Track. In our experiments, the collection, index, and search load are split over a peer-to-peer (P2P) network to gain more efficiency in terms of load balancing when searching large-scale collections. Since the bandwidth consumption between searching peers has to be limited in order to achieve a scalable, efficient system, we exploit XML-structure to reduce the number of messages sent between peers. In spite of mainly aiming at efficiency, our search engine SPIRIX resulted in quite high precisions and made it into the top-10 systems (focused task). It ranked 7 at the Ad Hoc Track (59%) and came first in terms of precision at the Efficiency Track (both categories of topics). For the first time at INEX, a P2P system achieved an official search quality comparable with the top-10 centralized solutions!

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Baeza-Yates, R., Castillo, C., Junqueira, F., Plachouras, V., Silvestri, F.: Challenges on Distributed Web Retrieval. In: IEEE Int. Conf. on Data Engineering (ICDE’07), Turkey (2007)

    Google Scholar 

  2. Balakrishnan, H., Kaashoek, F., Karger, D., Morris, R., Stoica, I.: Looking Up Data in P2P Systems. Communications of the ACM 46(2) (2003)

    Google Scholar 

  3. Carmel, D., Maarek, Y., Mandelbrod, M., Mass, Y., Soffer, A.: Searching XML Documents via XML Fragments. In: Proc. of the 26th Int. ACM SIGIR, Toronto, Canada (2003)

    Google Scholar 

  4. Ciaccia, P., Penzo, W.: Adding Flexibility to Strucuture Similarity Queries on XML Data. In: Andreasen, T., et al. (eds.) FQAS 2002. LNCS (LNAI), vol. 2522. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  5. Moffat, A., Webber, W., Zobel, J., Baeza-Yates, R.: A pipelined architecture for distributed text query evaluation. In: Springer Science + Business Media, LLC 2007 (2007)

    Google Scholar 

  6. Risson, J., Moors, T.: Survey of research towards robust peer-to-peer networks – search methods. In: Technical Report UNSW-EE-P2P-1-1, Uni. of NSW, Australia (2004)

    Google Scholar 

  7. Robertson, S., Zaragoza, H., Taylor, M.: Simple BM25 extension to multiple weighted fields. In: Proc. of CIKM’04. ACM Press, New York (2004)

    Google Scholar 

  8. Steinmetz, R., Wehrle, K. (eds.): Peer-to-Peer Systems and Applications. LNCS, vol. 3485. Springer, Heidelberg (2005)

    Google Scholar 

  9. Stoica, I., Morris, R., Liben-Nowell, D., Karger, D., Kaashoek, F., Dabek, F., Balakrishnan, H.: Chord - A Scalable Peer-to-peer Lookup Protocol for Internet Applications. IEEE/ACM Transactions on Networking 11(1) (2003)

    Google Scholar 

  10. Vinson, A., Heuser, C., Da Silva, A., De Moura, E.: An Approach to XML Path Matching. In: WIDM’07, Lisboa, Portugal, November 9 (2007)

    Google Scholar 

  11. Winter, J., Jeliazkov, N., Kühne, G.: Aiming For More Efficiency By Detecting Structural Similarity. In: Geva, S., Kamps, J., Trotman, A. (eds.) INEX 2008. LNCS, vol. 5631, pp. 237–242. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  12. Zeinalipour-Yazti, D., Kalogeraki, V., Gunopulos, D.: Information Retrieval in Peer-to-Peer Networks. IEEE CiSE Magazine, Special Issue on Web Engineering (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Winter, J., Kühne, G. (2010). Achieving High Precisions with Peer-to-Peer Is Possible!. In: Geva, S., Kamps, J., Trotman, A. (eds) Focused Retrieval and Evaluation. INEX 2009. Lecture Notes in Computer Science, vol 6203. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14556-8_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-14556-8_25

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-14555-1

  • Online ISBN: 978-3-642-14556-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics