skip to main content
10.1145/2911451.2911520acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

A Comparison of Cache Blocking Methods for Fast Execution of Ensemble-based Score Computation

Published: 07 July 2016 Publication History

Abstract

Machine-learned classification and ranking techniques often use ensembles to aggregate partial scores of feature vectors for high accuracy and the runtime score computation can become expensive when employing a large number of ensembles. The previous work has shown the judicious use of memory hierarchy in a modern CPU architecture which can effectively shorten the time of score computation. However, different traversal methods and blocking parameter settings can exhibit different cache and cost behavior depending on data and architectural characteristics. It is very time-consuming to conduct exhaustive search for performance comparison and optimum selection. This paper provides an analytic comparison of cache blocking methods on their data access performance with an approximation and proposes a fast guided sampling scheme to select a traversal method and blocking parameters for effective use of memory hierarchy. The evaluation studies with three datasets show that within a reasonable amount of time, the proposed scheme can identify a highly competitive solution that significantly accelerates score calculation.

References

[1]
Lector 4.0 datasets. http://research.microsoft.com/en-us/um/beijing/projects/letor/letor4dataset.aspx.
[2]
Microsoft learning to rank datasets. http://research.microsoft.com/en-us/projects/mslr/.
[3]
Nima Asadi and Jimmy Lin. Training Efficient Tree-Based Models for Document Ranking. In ECIR, pages 146--157, 2013.
[4]
Nima Asadi, Jimmy Lin, and Arjen P De Vries. Runtime Optimizations for Tree-Based Machine Learning Models. IEEE TKDE, pages 1--13, 2013.
[5]
Christopher J. C. Burges, Krysta Marie Svore, Paul N. Bennett, Andrzej Pastusiak, and Qiang Wu. Learning to rank using an ensemble of lambda-gradient models. In J. of Machine Learning Research, pages 25--35, 2011.
[6]
B. Barla Cambazoglu, Hugo Zaragoza, Olivier Chapelle, Jiang Chen, Ciya Liao, Zhaohui Zheng, and Jon Degenhardt. Early exit optimizations for additive machine learned ranking systems. WSDM '10, pages 411--420, 2010.
[7]
Olivier Chapelle and Yi Chang. Yahoo! Learning to Rank Challenge Overview. J. of Machine Learning Research, pages 1--24, 2011.
[8]
Jerome H. Friedman. Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29:1189--1232, 2000.
[9]
Yasser Ganjisaffar, Rich Caruana, and Cristina Lopes. Bagging Gradient-Boosted Trees for High Precision, Low Variance Ranking Models. In SIGIR, pages 85--94, 2011.
[10]
Pierre Geurts and Gilles Louppe. Learning to rank with extremely randomized trees. J. of Machine Learning Research, 14:49--61, 2011.
[11]
Andrey Gulin, Igor Kuralenok, and Dmitry Pavlov. Winning the transfer learning track of yahoo!'s learning to rank challenge with yetirank. J. of Machine Learning Research, 14:63--76, 2011.
[12]
Claudio Lucchese, Franco Maria Nardini, Salvatore Orlando, Raffaele Perego, Nicola Tonellotto, and Rossano Venturini. Quickscorer: A fast algorithm to rank documents with additive ensembles of regression trees. In SIGIR, pages 73--82, 2015.
[13]
Dmitry Yurievich Pavlov, Alexey Gorodilov, and Cliff A. Brunk. Bagboo: a scalable hybrid bagging-the-boosting model. In CIKM, pages 1897--1900, 2010.
[14]
Xun Tang, Xin Jin, and Tao Yang. Cache-conscious runtime optimization for ranking ensembles. SIGIR '14, pages 1123--1126, 2014.
[15]
Jiancong Tong, Gang Wang, and Xiaoguang Liu. Latency-aware strategy for static list caching in flash-based web search engines. In CIKM, pages 1209--1212, 2013.
[16]
Lidan Wang, Jimmy Lin, and Donald Metzler. Learning to efficiently rank. SIGIR '10, pages 138--145, 2010.
[17]
Lidan Wang, Jimmy Lin, and Donald Metzler. A cascade ranking model for efficient ranked retrieval. SIGIR '11, pages 105--114, 2011.

Cited By

View all
  • (2024)SilvanForge: A Schedule-Guided Retargetable Compiler for Decision Tree InferenceProceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles10.1145/3694715.3695958(488-504)Online publication date: 4-Nov-2024
  • (2022)Treebeard: An Optimizing Compiler for Decision Tree Based ML InferenceProceedings of the 55th Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO56248.2022.00043(494-511)Online publication date: 1-Oct-2022
  • (2021)A simple and efficient storage format for SIMD-accelerated SpMVCluster Computing10.1007/s10586-021-03340-124:4(3431-3448)Online publication date: 1-Dec-2021
  • Show More Cited By

Index Terms

  1. A Comparison of Cache Blocking Methods for Fast Execution of Ensemble-based Score Computation

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval
    July 2016
    1296 pages
    ISBN:9781450340694
    DOI:10.1145/2911451
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 07 July 2016

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. cache locality
    2. ensemble methods
    3. query processing

    Qualifiers

    • Research-article

    Funding Sources

    • NSF

    Conference

    SIGIR '16
    Sponsor:

    Acceptance Rates

    SIGIR '16 Paper Acceptance Rate 62 of 341 submissions, 18%;
    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)56
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 13 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)SilvanForge: A Schedule-Guided Retargetable Compiler for Decision Tree InferenceProceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles10.1145/3694715.3695958(488-504)Online publication date: 4-Nov-2024
    • (2022)Treebeard: An Optimizing Compiler for Decision Tree Based ML InferenceProceedings of the 55th Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO56248.2022.00043(494-511)Online publication date: 1-Oct-2022
    • (2021)A simple and efficient storage format for SIMD-accelerated SpMVCluster Computing10.1007/s10586-021-03340-124:4(3431-3448)Online publication date: 1-Dec-2021
    • (2021)PAS: A new powerful and simple quantum computing simulatorSoftware: Practice and Experience10.1002/spe.304953:1(142-159)Online publication date: 29-Oct-2021
    • (2019)Accelerated Query Processing Via Similarity Score PredictionProceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3331184.3331207(485-494)Online publication date: 18-Jul-2019
    • (2019)Joint Optimization of Cascade Ranking ModelsProceedings of the Twelfth ACM International Conference on Web Search and Data Mining10.1145/3289600.3290986(15-23)Online publication date: 30-Jan-2019
    • (2018)RapidScorerProceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3219819.3219857(941-950)Online publication date: 19-Jul-2018
    • (2018)Universal Approximation Functions for Fast Learning to RankThe 41st International ACM SIGIR Conference on Research & Development in Information Retrieval10.1145/3209978.3210137(1017-1020)Online publication date: 27-Jun-2018
    • (2018)Query Driven Algorithm Selection in Early Stage RetrievalProceedings of the Eleventh ACM International Conference on Web Search and Data Mining10.1145/3159652.3159676(396-404)Online publication date: 2-Feb-2018
    • (2017)Efficient Cost-Aware Cascade Ranking in Multi-Stage RetrievalProceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3077136.3080819(445-454)Online publication date: 7-Aug-2017
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media