research-article

A Comparison of Cache Blocking Methods for Fast Execution of Ensemble-based Score Computation

Authors:

Xun TangAuthors Info & Claims

SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval

Pages 629 - 638

https://doi.org/10.1145/2911451.2911520

Published: 07 July 2016 Publication History

Abstract

Machine-learned classification and ranking techniques often use ensembles to aggregate partial scores of feature vectors for high accuracy and the runtime score computation can become expensive when employing a large number of ensembles. The previous work has shown the judicious use of memory hierarchy in a modern CPU architecture which can effectively shorten the time of score computation. However, different traversal methods and blocking parameter settings can exhibit different cache and cost behavior depending on data and architectural characteristics. It is very time-consuming to conduct exhaustive search for performance comparison and optimum selection. This paper provides an analytic comparison of cache blocking methods on their data access performance with an approximation and proposes a fast guided sampling scheme to select a traversal method and blocking parameters for effective use of memory hierarchy. The evaluation studies with three datasets show that within a reasonable amount of time, the proposed scheme can identify a highly competitive solution that significantly accelerates score calculation.

References

[1]

Lector 4.0 datasets. http://research.microsoft.com/en-us/um/beijing/projects/letor/letor4dataset.aspx.

[2]

Microsoft learning to rank datasets. http://research.microsoft.com/en-us/projects/mslr/.

[3]

Nima Asadi and Jimmy Lin. Training Efficient Tree-Based Models for Document Ranking. In ECIR, pages 146--157, 2013.

Digital Library

[4]

Nima Asadi, Jimmy Lin, and Arjen P De Vries. Runtime Optimizations for Tree-Based Machine Learning Models. IEEE TKDE, pages 1--13, 2013.

[5]

Christopher J. C. Burges, Krysta Marie Svore, Paul N. Bennett, Andrzej Pastusiak, and Qiang Wu. Learning to rank using an ensemble of lambda-gradient models. In J. of Machine Learning Research, pages 25--35, 2011.

[6]

B. Barla Cambazoglu, Hugo Zaragoza, Olivier Chapelle, Jiang Chen, Ciya Liao, Zhaohui Zheng, and Jon Degenhardt. Early exit optimizations for additive machine learned ranking systems. WSDM '10, pages 411--420, 2010.

Digital Library

[7]

Olivier Chapelle and Yi Chang. Yahoo! Learning to Rank Challenge Overview. J. of Machine Learning Research, pages 1--24, 2011.

[8]

Jerome H. Friedman. Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29:1189--1232, 2000.

[9]

Yasser Ganjisaffar, Rich Caruana, and Cristina Lopes. Bagging Gradient-Boosted Trees for High Precision, Low Variance Ranking Models. In SIGIR, pages 85--94, 2011.

Digital Library

[10]

Pierre Geurts and Gilles Louppe. Learning to rank with extremely randomized trees. J. of Machine Learning Research, 14:49--61, 2011.

[11]

Andrey Gulin, Igor Kuralenok, and Dmitry Pavlov. Winning the transfer learning track of yahoo!'s learning to rank challenge with yetirank. J. of Machine Learning Research, 14:63--76, 2011.

[12]

Claudio Lucchese, Franco Maria Nardini, Salvatore Orlando, Raffaele Perego, Nicola Tonellotto, and Rossano Venturini. Quickscorer: A fast algorithm to rank documents with additive ensembles of regression trees. In SIGIR, pages 73--82, 2015.

Digital Library

[13]

Dmitry Yurievich Pavlov, Alexey Gorodilov, and Cliff A. Brunk. Bagboo: a scalable hybrid bagging-the-boosting model. In CIKM, pages 1897--1900, 2010.

Digital Library

[14]

Xun Tang, Xin Jin, and Tao Yang. Cache-conscious runtime optimization for ranking ensembles. SIGIR '14, pages 1123--1126, 2014.

Digital Library

[15]

Jiancong Tong, Gang Wang, and Xiaoguang Liu. Latency-aware strategy for static list caching in flash-based web search engines. In CIKM, pages 1209--1212, 2013.

Digital Library

[16]

Lidan Wang, Jimmy Lin, and Donald Metzler. Learning to efficiently rank. SIGIR '10, pages 138--145, 2010.

Digital Library

[17]

Lidan Wang, Jimmy Lin, and Donald Metzler. A cascade ranking model for efficient ranked retrieval. SIGIR '11, pages 105--114, 2011.

Digital Library

Cited By

Prasad ARajendra SRajan KGovindarajan RBondhugula UWitchel EArpaci-Dusseau ARossbach CKeeton K(2024)SilvanForge: A Schedule-Guided Retargetable Compiler for Decision Tree InferenceProceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles10.1145/3694715.3695958(488-504)Online publication date: 4-Nov-2024
https://dl.acm.org/doi/10.1145/3694715.3695958
Prasad ARajendra SRajan KGovindarajan RBondhugula UHardavellas NCampanoni SGrot BKarpuzcu U(2022)Treebeard: An Optimizing Compiler for Decision Tree Based ML InferenceProceedings of the 55th Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO56248.2022.00043(494-511)Online publication date: 1-Oct-2022
https://dl.acm.org/doi/10.1109/MICRO56248.2022.00043
Bian HHuang JDong RGuo YLiu LHuang DWang X(2021)A simple and efficient storage format for SIMD-accelerated SpMVCluster Computing10.1007/s10586-021-03340-124:4(3431-3448)Online publication date: 1-Dec-2021
https://dl.acm.org/doi/10.1007/s10586-021-03340-1
Show More Cited By

Index Terms

A Comparison of Cache Blocking Methods for Fast Execution of Ensemble-based Score Computation
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking

Recommendations

Cache-conscious runtime optimization for ranking ensembles
SIGIR '14: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval

Multi-tree ensemble models have been proven to be effective for document ranking. Using a large number of trees can improve accuracy, but it takes time to calculate ranking scores of matched documents. This paper investigates data traversal methods for ...
Fast indexing for blocked array layouts to reduce cache misses

Several studies have been conducted on blocked data layouts, in conjunction with loop tiling to improve locality of references. In this paper, we further reduce cache misses, restructuring the memory layout of multi-dimensional arrays, so that array ...
Performance Evaluation of Ensemble Methods For Software Fault Prediction: An Experiment
ASWEC ' 15 Vol. II: Proceedings of the ASWEC 2015 24th Australasian Software Engineering Conference

In object-oriented software development, a plethora of studies have been carried out to present the application of machine learning algorithms for fault prediction. Furthermore, it has been empirically validated that an ensemble method can improve ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval

July 2016

1296 pages

ISBN:9781450340694

DOI:10.1145/2911451

General Chairs:
Raffaele Perego
ISTI-CNR, Italy
,
Fabrizio Sebastiani
Qatar Computing Research Institute, HBKU, Qatar
,
Program Chairs:
Javed Aslam
Northeastern University, US
,
Ian Ruthven
University of Strathclyde, UK
,
Justin Zobel
University of Melbourne, Australia

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 July 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

NSF

Conference

SIGIR '16

Sponsor:

SIGIR

SIGIR '16: The 39th International ACM SIGIR conference on research and development in Information Retrieval

July 17 - 21, 2016

Pisa, Italy

Acceptance Rates

SIGIR '16 Paper Acceptance Rate 62 of 341 submissions, 18%;

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
259
Total Downloads

Downloads (Last 12 months)56
Downloads (Last 6 weeks)0

Reflects downloads up to 13 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Prasad ARajendra SRajan KGovindarajan RBondhugula UWitchel EArpaci-Dusseau ARossbach CKeeton K(2024)SilvanForge: A Schedule-Guided Retargetable Compiler for Decision Tree InferenceProceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles10.1145/3694715.3695958(488-504)Online publication date: 4-Nov-2024
https://dl.acm.org/doi/10.1145/3694715.3695958
Prasad ARajendra SRajan KGovindarajan RBondhugula UHardavellas NCampanoni SGrot BKarpuzcu U(2022)Treebeard: An Optimizing Compiler for Decision Tree Based ML InferenceProceedings of the 55th Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO56248.2022.00043(494-511)Online publication date: 1-Oct-2022
https://dl.acm.org/doi/10.1109/MICRO56248.2022.00043
Bian HHuang JDong RGuo YLiu LHuang DWang X(2021)A simple and efficient storage format for SIMD-accelerated SpMVCluster Computing10.1007/s10586-021-03340-124:4(3431-3448)Online publication date: 1-Dec-2021
https://dl.acm.org/doi/10.1007/s10586-021-03340-1
Bian HHuang JTang JDong RWu LWang X(2021)PAS: A new powerful and simple quantum computing simulatorSoftware: Practice and Experience10.1002/spe.304953:1(142-159)Online publication date: 29-Oct-2021
https://doi.org/10.1002/spe.3049
Petri MMoffat AMackenzie JCulpepper JBeck DPiwowarski BChevalier MGaussier EMaarek YNie JScholer F(2019)Accelerated Query Processing Via Similarity Score PredictionProceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3331184.3331207(485-494)Online publication date: 18-Jul-2019
https://dl.acm.org/doi/10.1145/3331184.3331207
Gallagher LChen RBlanco RCulpepper JCulpepper JMoffat ABennett PLerman K(2019)Joint Optimization of Cascade Ranking ModelsProceedings of the Twelfth ACM International Conference on Web Search and Data Mining10.1145/3289600.3290986(15-23)Online publication date: 30-Jan-2019
https://dl.acm.org/doi/10.1145/3289600.3290986
Ye TZhou HZou WGao BZhang RGuo YFarooq F(2018)RapidScorerProceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3219819.3219857(941-950)Online publication date: 19-Jul-2018
https://dl.acm.org/doi/10.1145/3219819.3219857
Cohen DFoley JZamani HAllan JCroft WCollins-Thompson KMei QDavison BLiu YYilmaz E(2018)Universal Approximation Functions for Fast Learning to RankThe 41st International ACM SIGIR Conference on Research & Development in Information Retrieval10.1145/3209978.3210137(1017-1020)Online publication date: 27-Jun-2018
https://dl.acm.org/doi/10.1145/3209978.3210137
Mackenzie JCulpepper JBlanco RCrane MClarke CLin JChang YZhai CLiu YMaarek Y(2018)Query Driven Algorithm Selection in Early Stage RetrievalProceedings of the Eleventh ACM International Conference on Web Search and Data Mining10.1145/3159652.3159676(396-404)Online publication date: 2-Feb-2018
https://dl.acm.org/doi/10.1145/3159652.3159676
Chen RGallagher LBlanco RCulpepper JKando NSakai TJoho HLi Hde Vries AWhite R(2017)Efficient Cost-Aware Cascade Ranking in Multi-Stage RetrievalProceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3077136.3080819(445-454)Online publication date: 7-Aug-2017
https://dl.acm.org/doi/10.1145/3077136.3080819
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten