skip to main content
10.1145/1851476.1851502acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
research-article

New caching techniques for web search engines

Published: 21 June 2010 Publication History

Abstract

This paper proposes a cache hierarchy that enables Web search engines to efficiently process user queries. The different caches in the hierarchy are used to store pieces of data which are useful to solve frequent queries. Cached items range from specific data such as query answers to generic data such as segments of index retrieved from secondary memory. The paper also presents a comparative study based on discrete-event simulation and bulk-synchronous parallelism. The studied performance metrics include overall query throughput, single-user query latency and power consumption. In all cases, the results show that the proposed cache hierarchy leads to better performance than a baseline approach built on state of the art caching techniques.

References

[1]
}}K. Amiri, S. Park, R. Tewari, and S. Padmanabhan. Scalable template-based query containment checking for web semantic caches. In ICDE, pp. 493--504. 2003.
[2]
}}R. Baeza-Yates, A. Gionis, F. Junqueira, V. Murdock, V. Plachouras, and F. Silvestri. Design trade-offs for search engine caching. ACM TWEB, 2(4):1--28, 2008.
[3]
}}R. Baeza and B. Ribeiro, Modern Information Retrieval, Addison-Wesley, 1999.
[4]
}}L. A. Barroso and U. Holzle. The case for energy-proportional computing. IEEE Computer, 40(12):33--37, 2007.
[5]
}}B. Chidlovskii, C. Roncancio, and M. Schneider. Semantic cache mechanism for heterogeneous Web querying. Computer Networks, 31(11--16):1347--1360, 1999.
[6]
}}B. Chidlovskii, and U. Borghoff. Semantic caching of Web queries. VLDB Journal, 9(1):2--17, 2000.
[7]
}}T. Fagni, R. Perego, F. Silvestri, and S. Orlando. Boosting the performance of Web search engines: Caching and prefetching query results by exploiting historical usage data. ACM TOIS, 24(1):51--78, 2006.
[8]
}}F. Ferrarotti, M. Marin and M. Mendoza. A last-resort semantic cache for Web queries. In SPIRE, pp. 310--321, 2009.
[9]
}}E. Feuerstein, V. Gil-Costa, M. Mizrahi and M. Marin. Performance of Web Search Algorithms. In VECPAR, 2010.
[10]
}}Q. Gan, T. Suel. Improved techniques for result caching in Web search engines. In WWW, pp. 431--440, 2009.
[11]
}}P. Godfrey, and J. Gryz. Answering queries by semantic caches. In DEXA, pp. 485--498, 1999.
[12]
}}R. Lempel, and S. Moran. Predictive caching and prefetching of query results in search engines. In WWW, pp 19--28, 2003.
[13]
}}X. Long, and T. Suel. Three-level caching for efficient query processing in large Web search engines. In WWW, pp. 257--266, 2005.
[14]
}}E. Markatos. On caching search engine query results. Computer Communications, 24(7), 2000.
[15]
}}M. Marin and V. Gil-Costa. High-performance distributed inverted files. In CIKM, pp. 935--938, 2007.
[16]
}}M. Marin, F. Ferrarotti, M. Mendoza, C. Gomez and V. Gil-Costa. Location cache for Web queries. In CIKM, pp. 1995--1998, 2009.
[17]
}}M. Marzolla, LibCppSim: A SIMULA-like, portable process-oriented simulation library in C++. In ESM, 2004.
[18]
}}M. Mendoza, M. Marin, F. Ferraroti, B. Poblete. Learning to distribute queries onto Web search nodes. In ECIR, 2010.
[19]
}}W. Moffat, J. Webber, Zobel, and R. Baeza-Yates. A pipelined architecture for distributed text query evaluation. Information Retrieval 10(3): 205--231, Aug. 2007.
[20]
}}A. Moffat and J. Zobel. What does it mean to measure performance? In Conf. Web Informations Systems, pp. 1--12, 2004.
[21]
}}D. Puppin, F. Silvestri, R. Perego, and R. Baeza-Yates. Load-balancing and caching for collection selection architectures. In INFOSCALE, 2007.
[22]
}}D. Puppin, F. Silvestri, R. Perego, and R. Baeza-Yates. Tuning the Capacity of Search Engines: Load-driven Routing and Incremental Caching to Reduce and Balance the Load. To appear in TOIS, 2009.
[23]
}}L. G. Valiant. A bridging model for parallel computation. Comm. ACM, 33:103--111, 1990.
[24]
}}H. Yan, S. Ding and T. Suel. Inverted index compression and query processing with optimized document ordering. In WWW, pp. 401--410, 2009.

Cited By

View all
  • (2021)A DFT-Based Running Time Prediction Algorithm for Web QueriesFuture Internet10.3390/fi1308020413:8(204)Online publication date: 4-Aug-2021
  • (2021)Feasibility of P2P-STB based crowdsourcing to speed-up photo classification for natural disastersCluster Computing10.1007/s10586-021-03381-6Online publication date: 28-Aug-2021
  • (2018)Inverted List Caching for Topical Index ShardsAdvances in Information Retrieval10.1007/978-3-319-76941-7_47(577-583)Online publication date: 1-Mar-2018
  • Show More Cited By

Index Terms

  1. New caching techniques for web search engines

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    HPDC '10: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
    June 2010
    911 pages
    ISBN:9781605589428
    DOI:10.1145/1851476
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 21 June 2010

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. caching strategies
    2. discrete-event simulation
    3. models for parallel computing
    4. query processing
    5. web search engines

    Qualifiers

    • Research-article

    Conference

    HPDC '10
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 166 of 966 submissions, 17%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 27 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2021)A DFT-Based Running Time Prediction Algorithm for Web QueriesFuture Internet10.3390/fi1308020413:8(204)Online publication date: 4-Aug-2021
    • (2021)Feasibility of P2P-STB based crowdsourcing to speed-up photo classification for natural disastersCluster Computing10.1007/s10586-021-03381-6Online publication date: 28-Aug-2021
    • (2018)Inverted List Caching for Topical Index ShardsAdvances in Information Retrieval10.1007/978-3-319-76941-7_47(577-583)Online publication date: 1-Mar-2018
    • (2017)Caching-Aware Techniques for Query Workload Partitioning in Parallel Search Engines2017 14th Web Information Systems and Applications Conference (WISA)10.1109/WISA.2017.33(44-49)Online publication date: Nov-2017
    • (2017)Simulating Search EnginesComputing in Science and Engineering10.1109/MCSE.2017.819:1(62-73)Online publication date: 1-Jan-2017
    • (2017)A machine learning approach for result caching in web search enginesInformation Processing and Management: an International Journal10.1016/j.ipm.2017.02.00653:4(834-850)Online publication date: 1-Jul-2017
    • (2017)A New Static Web Caching Mechanism Based on Mutual Dependency Between Result Cache and Posting List CacheWeb Information Systems Engineering – WISE 201710.1007/978-3-319-68786-5_12(148-156)Online publication date: 4-Oct-2017
    • (2015)Dynamic load balance for approximate parallel simulations with consistent hashingProceedings of the Conference on Summer Computer Simulation10.5555/2874916.2874934(1-10)Online publication date: 26-Jul-2015
    • (2015)Scalability Challenges in Web Search EnginesSynthesis Lectures on Information Concepts, Retrieval, and Services10.2200/S00662ED1V01Y201508ICR0457:6(1-138)Online publication date: 29-Dec-2015
    • (2015)Improved strategies and adaptive capability allocation algorithm for query result cachingInternational Journal of Web Engineering and Technology10.1504/IJWET.2015.07235110:3(272-290)Online publication date: 1-Oct-2015
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media