skip to main content
10.1145/1148170.1148232acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

Load balancing for term-distributed parallel retrieval

Published: 06 August 2006 Publication History

Abstract

Large-scale web and text retrieval systems deal with amounts of data that greatly exceed the capacity of any single machine. To handle the necessary data volumes and query throughput rates, parallel systems are used, in which the document and index data are split across tightly-clustered distributed computing systems. The index data can be distributed either by document or by term. In this paper we examine methods for load balancing in term-distributed parallel architectures, and propose a suite of techniques for reducing net querying costs. In combination, the techniques we describe allow a 30% improvement in query throughput when tested on an eight-node parallel computer system.

References

[1]
C. Badue, R. Baeza-Yates, B. Ribeiro-Neto, and N. Ziviani. Distributed query processing using partitioned inverted files. In G. Navarro, editor, Proc. String Processing and Information Retrieval Symp., pages 10--20, Laguna de San Rafael, Chile, November 2001. IEEE Computer Society.
[2]
L. A. Barroso, J. Dean, and U. Hölzle. Web search for a planet: The Google cluster architecture. IEEE Micro, 22--28, April 2003.
[3]
F. Cacheda, V. Plachouras, and I Ounis. Performance analysis of distributed architectures to index one terabyte of text. In S. McDonald and J. Tait, editors, Proc. ECIR European Conf. on IR Research, pages 395--408, Sunderland, UK, April 2004. Springer-Verlag. LNCS volume 2997.
[4]
B. Cahoon and K. S. McKinley. Performance evaluation of a distributed architecture for information retrieval. In H.-P. Frei, D. Harman, P. Schäuble, and R. Wilkinson, editors, Proc. Nineteenth Annual International ACM SIGIR Conf. on Research and Development in Information Retrieval, pages 110--118, Zurich, Switzerland, August 1996. ACM Press, New York.
[5]
B. Cahoon, K. S. McKinley, and Z. Lu. Evaluating the performance of distributed architectures for information retrieval using a variety of workloads. ACM Transactions on Information Systems, 1--43, January 2000.
[6]
J. K. Cringean, R. England, G. A. Manson, and P. Willett. Parallel text searching in serial files using a processor farm. In J. L. Vidick, editor, Proc. Thirteenth Annual International ACM SIGIR Conf. on Research and Development in Information Retrieval, pages 429--453, Brussels, Belgium, September 1990. ACM Press, New York.
[7]
D. Harman, W. McCoy, R. Toense, and G. Candela. Prototyping a distributed information retrieval system using statistical ranking. Information Processing & Management, 449--460, 1991.
[8]
D. Hawking. Scalable text retrieval for large digital libraries. In C. Thanos, editor, Proc. European Conf. on Research and Advanced Technology for Digital Libraries, pages 127--145, Pisa, Italy, September 1997. Springer-Verlag. LNCS volume 1324.
[9]
B. S. Jeong and E. Omiecinski. Inverted file partitioning schemes in multiple disk systems. IEEE Transactions on Parallel and Distributed Systems, 142--153, 1995.
[10]
N. Lester, A. Moffat, W. Webber, and J. Zobel. Space-limited ranked query evaluation using adaptive pruning. In Proc. Sixth Int. Conf. on Web Informations Systems, pages 470--477, New York, November 2005. LNCS 3806, Springer.
[11]
Z. Lu and K. S. McKinley. Partial collection replication for information retrieval. Kluwer International Journal of Information Retrieval, 159--198, 2003.
[12]
A. MacFarlane, J. A. McCann, and S. E. Robertson. Parallel search using partitioned inverted files. In P. de la Fuente, editor, Proc. String Processing and Information Retrieval Symp., pages 209--220, A Coruña, Spain, September 2000. IEEE Computer Society Press, Los Alamitos, California.
[13]
S. Melnik, S. Raghavan, B. Yang, and H. Garcia-Molina. Building a distributed full-text index for the web. ACM Transactions on Information Systems, 217--241, 2001.
[14]
A. Moffat, W. Webber, J. Zobel, and R. Baeza-Yates. A pipelined architecture for distributed text query evaluation. September 2005. Submitted.
[15]
B. Ribeiro-Neto and R. Barbosa. Query performance for tightly coupled distributed digital libraries. In I. Witten, R. Akscyn, and F. M. Shipman III, editors, Proc. ACM Digital Libraries, pages 182--190, Pittsburgh, Pennslyvania, June 1998. ACM Press, New York.
[16]
C. Stanfill. Partitioned posting files: a parallel inverted file structure for information retrieval. In J. L. Vidick, editor, Proc. Thirteenth Annual International ACM SIGIR Conf. on Research and Development in Information Retrieval, pages 413--428, Brussels, Belgium, September 1990. ACM Press, New York.
[17]
A. Tomasic and H. García-Molina. Performance issues in distributed shared-nothing information-retrieval systems. Information Processing & Management, 647--665, 1996.
[18]
W. Webber and A. Moffat. In search of reliable retrieval experiments. In J. Kay, A. Turpin, and R. Wilkinson, editors, Proc. 10th Australasian Document Computing Symposium, pages 26--33, Sydney, Australia, December 2005. University of Sydney.
[19]
W. Xi, O. Sornil, and E. A. Fox. Hybrid partition inverted files for large-scale digital libraries. In Proc. Digital Library: IT Opportunities and Challenges in the New Millennium, pages 404--418, Beijing, China, July 2002a. Beijing Library Press.
[20]
W. Xi, O. Sornil, M. Luo, and E. A. Fox. Hybrid partition inverted files: Experimental validation. In M. Agosti and C. Thanos, editors, Proc. European Conf. on Research and Advanced Technology for Digital Libraries, pages 422--431, Rome, Italy, September 2002b. Springer-Verlag. LNCS volume 2458.

Cited By

View all
  • (2023)A Query-Based Weighted Document Partitioning Method for Load Balancing in Search EnginesWireless Personal Communications10.1007/s11277-023-10176-y129:3(1489-1511)Online publication date: 20-Mar-2023
  • (2022)Parallelism-Optimizing Data Placement for Faster Data-Parallel ComputationsProceedings of the VLDB Endowment10.14778/3574245.357426016:4(760-771)Online publication date: 1-Dec-2022
  • (2020)Improving Load Balance via Resource Exchange in Large-Scale Search EnginesProceedings of the 49th International Conference on Parallel Processing10.1145/3404397.3404402(1-11)Online publication date: 17-Aug-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
August 2006
768 pages
ISBN:1595933697
DOI:10.1145/1148170
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 August 2006

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

SIGIR06
Sponsor:
SIGIR06: The 29th Annual International SIGIR Conference
August 6 - 11, 2006
Washington, Seattle, USA

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)A Query-Based Weighted Document Partitioning Method for Load Balancing in Search EnginesWireless Personal Communications10.1007/s11277-023-10176-y129:3(1489-1511)Online publication date: 20-Mar-2023
  • (2022)Parallelism-Optimizing Data Placement for Faster Data-Parallel ComputationsProceedings of the VLDB Endowment10.14778/3574245.357426016:4(760-771)Online publication date: 1-Dec-2022
  • (2020)Improving Load Balance via Resource Exchange in Large-Scale Search EnginesProceedings of the 49th International Conference on Parallel Processing10.1145/3404397.3404402(1-11)Online publication date: 17-Aug-2020
  • (2019)Cache-aware load balancing of data center applicationsProceedings of the VLDB Endowment10.14778/3311880.331188712:6(709-723)Online publication date: 1-Feb-2019
  • (2019)Resource-Efficient Index Shard Replication in Large Scale Search EnginesIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2019.292442330:12(2820-2835)Online publication date: 1-Dec-2019
  • (2019)Hybrid capacity planning methodology for web search enginesSimulation Modelling Practice and Theory10.1016/j.simpat.2018.09.01693(148-163)Online publication date: May-2019
  • (2018)Index Shard Replication Strategies for Improving Resource Utilization in Large Scale Search EnginesProceedings of the 47th International Conference on Parallel Processing10.1145/3225058.3225102(1-10)Online publication date: 13-Aug-2018
  • (2017)Small-Term Distribution for Disk-Based SearchProceedings of the 2017 ACM Symposium on Document Engineering10.1145/3103010.3103022(49-58)Online publication date: 31-Aug-2017
  • (2017)Caching-Aware Techniques for Query Workload Partitioning in Parallel Search Engines2017 14th Web Information Systems and Applications Conference (WISA)10.1109/WISA.2017.33(44-49)Online publication date: Nov-2017
  • (2016)Load-Balancing in Distributed Selective SearchProceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval10.1145/2911451.2914689(905-908)Online publication date: 7-Jul-2016
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media