Skip to main content

Two-Dimensional Distributed Inverted Files

  • Conference paper
String Processing and Information Retrieval (SPIRE 2009)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5721))

Included in the following conference series:

Abstract

Term-partitioned indexes are generally inefficient for the evaluation of conjunctive queries, as they require the communication of long posting lists. On the other side, document-partitioned indexes incur in excessive overheads as the evaluation of every query involves the participation of all the processors, therefore their scalability is not adequate for real systems. We propose to arrange a set of processors in a two-dimensional array, applying term-partitioning at row level and document-partitioning at column level. Choosing the adequate number of rows and columns given the available number of processors, together with the selection of the proper ways of partitioning the index over that topology is the subject of this paper.

This research was funded by a Yahoo! Research Alliance Grant.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Badue, C., Baeza-Yates, R., Ribeiro, B., Ziviani, N.: Distributed query processing using partitioned inverted files. In: SPIRE (2001)

    Google ScholarĀ 

  2. Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval

    Google ScholarĀ 

  3. Costa, G.V., Marin, M., Reyes, N.: Parallel query processing on distributed clustering indexes. Journal of Discrete AlgorithmsĀ (7) , 03ā€“17 (2009)

    ArticleĀ  MathSciNetĀ  MATHĀ  Google ScholarĀ 

  4. Jeong, B.S., Omiecinski, E.: Inverted file partitioning schemes in multiple disk systems. IEEE Trans. Parallel and Distributed SystemsĀ 16(2), 142ā€“153 (1995)

    ArticleĀ  Google ScholarĀ 

  5. Lucchese, C., Orlando, S., Perego, R., Silvestri, F.: Mining query logs to optimize index partitioning in parallel web search engines. In: INFOSCALE (2007)

    Google ScholarĀ 

  6. MacFarlane, A.A., McCann, J.A., Robertson, S.E.: Parallel search using partitioned inverted files. In: SPIRE (2000)

    Google ScholarĀ 

  7. Marin, M., Costa, G.V.: High-performance distributed inverted files. In: CIKM 2007 (2007)

    Google ScholarĀ 

  8. Marin, M., Gomez-Pantoja, C., Gonzalez, S., Gil-Costa, V.: Scheduling Intersection Queries in Term Partitioned Inverted Files. In: Luque, E., Margalef, T., BenĆ­tez, D. (eds.) Euro-Par 2008. LNCS, vol.Ā 5168, pp. 434ā€“443. Springer, Heidelberg (2008)

    ChapterĀ  Google ScholarĀ 

  9. Moffat, A., Webber, W., Zobel, J., Baeza-Yates, R.: A pipelined architecture for distributed text query evaluation. Information RetrievalĀ 10(3), 205ā€“231 (2007)

    ArticleĀ  Google ScholarĀ 

  10. Ribeiro-Neto, B.A., Barbosa, R.A.: Query performance for tightly coupled distributed digital libraries. In: ACM Conf. Digital Libraries, pp. 182ā€“190 (1998)

    Google ScholarĀ 

  11. Stanfill, C.: Partitioned posting files: a parallel inverted file structure for information retrieval. In: SIGIR (1990)

    Google ScholarĀ 

  12. Suel, T., Mathur, C., Wu, J.W., Zhang, J., Delis, A., Kharrazi, M., Long, X., Shanmugasundaram, K.: ODISSEA: A peer-to-peer architecture for scalable web search and information retrieval. In: WWW 2003 (2003)

    Google ScholarĀ 

  13. Tang, C., Dwarkadas, S.: Hybrid global-local indexing for efficient peer-to-peer information retrieval. In: NSDI (2004)

    Google ScholarĀ 

  14. Tomasic, A., GarcĆ­a-Molina, H.: Performance issues in distributed shared-nothing information-retrieval systems. Information Processing & ManagementĀ 32(6), 647ā€“665 (1996)

    ArticleĀ  Google ScholarĀ 

  15. Xi, W., Sornil, O., Luo, M., Fox, E.A.: Hybrid partition inverted files: Experimental validation. In: Agosti, M., Thanos, C. (eds.) ECDL 2002, vol.Ā 2458, p. 422. Springer, Heidelberg (2002)

    ChapterĀ  Google ScholarĀ 

  16. Zhang, J., Suel, T.: Optimized inverted list assignment in distributed search engine architectures. In: IEEE IPDPS 2007(2007)

    Google ScholarĀ 

  17. Zhong, M., Shen, K., Seiferas, J.I.: Correlation-aware object placement for multi-object operations. In: ICDCS 2008, pp. 512ā€“521 (2008)

    Google ScholarĀ 

  18. Zobel, J., Moffat, A.: Inverted files for text search engines. ACM Computing SurveysĀ 38(2) (2006)

    Google ScholarĀ 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

Ā© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Feuerstein, E., Marin, M., Mizrahi, M., Gil-Costa, V., Baeza-Yates, R. (2009). Two-Dimensional Distributed Inverted Files. In: Karlgren, J., Tarhio, J., Hyyrƶ, H. (eds) String Processing and Information Retrieval. SPIRE 2009. Lecture Notes in Computer Science, vol 5721. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03784-9_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-03784-9_20

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-03783-2

  • Online ISBN: 978-3-642-03784-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics