skip to main content
10.1145/1316902.1316912acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Load balancing distributed inverted files

Published: 09 November 2007 Publication History

Abstract

This paper present a comparison of scheduling algorithms applied to the context of load balancing the query traffic on distributed inverted files. We implemented a number of algorithms taken from the literature. We propose a novel method to formulate the cost of query processing so that these algorithms can be used to schedule queries onto processors. We avoid measuring load balance at the search engine side because this can lead to imprecise evaluation. Our method is based on the simulation of a bulk-synchronous parallel computer at the broker machine side. This simulation determines an optimal way of processing the queries and provides a stable baseline upon which both the broker and search engine can tune their operation in accordance with the observed query traffic. We conclude that the simplest load balancing heuristics are good enough to achieve efficient performance. Our method can be used in practice by broker machines to schedule queries efficiently onto the cluster processors of search engines.

References

[1]
J. L. Bentley, D. S. Johnson, F. T. Leighton, C. C. McGeoch, and L. A. McGeoch. Some unexpected expected behavior results for bin packing. In STOC, pages 279--288, 1984.
[2]
O. J. Boxma. A probabilistic analysis of the lpt scheduling rule. In Performance, pages 475--490, 1984.
[3]
E. G. Coffman, Jr., M. R. Garey, and D. S. Johnson. Approximation algorithms (ed. D. Hochbaum), chapter Approximation algorithms for bin packing - a survey. PWS, 1997.
[4]
R. L. Graham. Bounds on multiprocessing timing anomalies. SIAM Journal of Applied Mathematics, 17(2):416--429, 1969.
[5]
D. S. Johnson, A. J. Demers, J. D. Ullman, M. R. Garey, and R. L. Graham. Worst-case performance bounds for simple one-dimensional packing algorithms. SIAM J. Comput., 3(4):299--325, 1974.
[6]
F. T. Leighton and P. W. Shor. Tight bounds for minimax grid matching, with applications to the average case analysis of algorithms. In STOC, pages 91--103, 1986.
[7]
A. Moffat, W. Webber, and J. Zobel. Load balancing for term-distributed parallel retrieval. 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 348--355, 2006.
[8]
D. Skillicorn, J. Hill, and W. McColl. Questions and answers about BSP. Technical Report PRG-TR-15-96, Computing Laboratory, Oxford University, 1996. Also in Journal of Scientific Programming, V.6 N.3, 1997.
[9]
L. Valiant. A bridging model for parallel computation. Comm. ACM, 33:103--111, Aug. 1990.
[10]
J. Zobel and A. Moffat. Inverted files for text search engines. ACM Computing Surveys, 38(2), 2006.

Cited By

View all
  • (2010)A combined semi-pipelined query processing architecture for distributed full-text retrievalProceedings of the 11th international conference on Web information systems engineering10.5555/1991336.1991400(587-601)Online publication date: 12-Dec-2010
  • (2010)A Combined Semi-pipelined Query Processing Architecture for Distributed Full-Text Retrieval11th International Conference on Web Information Systems Engineering --- WISE 2010 - Volume 648810.1007/978-3-642-17616-6_51(587-601)Online publication date: 12-Dec-2010
  • (2009)An Index Clustering and Mapping Algorithm for Large Scale Astronomical Data SearchingProceedings of the 2009 29th IEEE International Conference on Distributed Computing Systems Workshops10.1109/ICDCSW.2009.56(318-323)Online publication date: 22-Jun-2009
  • Show More Cited By

Index Terms

  1. Load balancing distributed inverted files

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    WIDM '07: Proceedings of the 9th annual ACM international workshop on Web information and data management
    November 2007
    168 pages
    ISBN:9781595938299
    DOI:10.1145/1316902
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 09 November 2007

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. inverted files
    2. parallel and distributed computing

    Qualifiers

    • Research-article

    Conference

    CIKM07

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 27 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2010)A combined semi-pipelined query processing architecture for distributed full-text retrievalProceedings of the 11th international conference on Web information systems engineering10.5555/1991336.1991400(587-601)Online publication date: 12-Dec-2010
    • (2010)A Combined Semi-pipelined Query Processing Architecture for Distributed Full-Text Retrieval11th International Conference on Web Information Systems Engineering --- WISE 2010 - Volume 648810.1007/978-3-642-17616-6_51(587-601)Online publication date: 12-Dec-2010
    • (2009)An Index Clustering and Mapping Algorithm for Large Scale Astronomical Data SearchingProceedings of the 2009 29th IEEE International Conference on Distributed Computing Systems Workshops10.1109/ICDCSW.2009.56(318-323)Online publication date: 22-Jun-2009
    • (2008)Information retrieval from digital libraries in SQLProceedings of the 10th ACM workshop on Web information and data management10.1145/1458502.1458512(55-62)Online publication date: 30-Oct-2008
    • (2008)Load Balancing Distributed Inverted FilesProceedings of the 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP 2008)10.1109/PDP.2008.93(329-333)Online publication date: 13-Feb-2008
    • (2008)An Overview of Web Research in ChileProceedings of the 2008 Latin American Web Conference10.1109/LA-WEB.2008.26(135-143)Online publication date: 28-Oct-2008
    • (2008)Scheduling Intersection Queries in Term Partitioned Inverted FilesProceedings of the 14th international Euro-Par conference on Parallel Processing10.1007/978-3-540-85451-7_47(434-443)Online publication date: 26-Aug-2008

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media