short-paper

Exploiting CPU SIMD Extensions to Speed-up Document Scoring with Tree Ensembles

Authors:
Claudio Lucchese

ISTI-CNR and Istella Srl, Pisa, Italy

ISTI-CNR and Istella Srl, Pisa, Italy
View Profile

,
Franco Maria Nardini

ISTI-CNR and Istella Srl, Pisa, Italy

ISTI-CNR and Istella Srl, Pisa, Italy
View Profile

,
Salvatore Orlando

Univ. of Venice, Venice, Italy

Univ. of Venice, Venice, Italy
View Profile

,
Raffaele Perego

ISTI-CNR and Istella Srl, Pisa, Italy

ISTI-CNR and Istella Srl, Pisa, Italy
View Profile

,
Nicola Tonellotto

ISTI-CNR and Istella Srl, Pisa, Italy

ISTI-CNR and Istella Srl, Pisa, Italy
View Profile

,
Rossano Venturini

Univ. of Pisa and Istella Srl, Pisa, Italy

Univ. of Pisa and Istella Srl, Pisa, Italy
View Profile

SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information RetrievalJuly 2016Pages 833–836https://doi.org/10.1145/2911451.2914758

Published:07 July 2016Publication History

SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval

Pages 833–836

ABSTRACT

Scoring documents with learning-to-rank (LtR) models based on large ensembles of regression trees is currently deemed one of the best solutions to effectively rank query results to be returned by large scale Information Retrieval systems. This paper investigates the opportunities given by SIMD capabilities of modern CPUs to the end of efficiently evaluating regression trees ensembles. We propose V-QuickScorer (vQS), which exploits SIMD extensions to vectorize the document scoring, i.e., to perform the ensemble traversal by evaluating multiple documents simultaneously. We provide a comprehensive evaluation of vQS against the state of the art on three publicly available datasets. Experiments show that vQS provides speed-ups up to a factor of 3.2x.

References

N. Asadi, J. Lin, and A. P. de Vries. Runtime optimizations for tree-based machine learning models. IEEE Transactions on Knowledge and Data Engineering, 26(9):2281--2292, 2014.Google ScholarCross Ref
G. Capannini, D. Dato, C. Lucchese, M. Mori, F. M. Nardini, S. Orlando, R. Perego, and N. Tonellotto. Quality versus Efficiency in Document Scoring with Learning-to-Rank Models. Information Processing and Management, 2016.Google Scholar
J. H. Friedman. Greedy function approximation: a gradient boosting machine. Annals of Statistics, pages 1189--1232, 2001.Google ScholarCross Ref
C. Lucchese, F. M. Nardini, S. Orlando, R. Perego, N. Tonellotto, and R. Venturini. Quickscorer: A fast algorithm to rank documents with additive ensembles of regression trees. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pages 73--82. ACM, 2015. Google ScholarDigital Library
O. Polychroniou, A. Raghavan, and K. A. Ross. Rethinking simd vectorization for in-memory databases. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, SIGMOD '15, pages 1493--1508, New York, NY, USA, 2015. ACM. Google ScholarDigital Library
Q. Wu, C. J. Burges, K. M. Svore, and J. Gao. Adapting boosting for information retrieval measures. Information Retrieval, 2010. Google ScholarDigital Library

Index Terms

Exploiting CPU SIMD Extensions to Speed-up Document Scoring with Tree Ensembles
1. Computing methodologies
  1. Machine learning
    1. Machine learning algorithms
      1. Ensemble methods
2. Information systems
  1. Information retrieval
    1. Retrieval models and ranking
      1. Learning to rank

Recommendations

Fast Ranking with Additive Ensembles of Oblivious and Non-Oblivious Regression Trees

Learning-to-Rank models based on additive ensembles of regression trees have been proven to be very effective for scoring query results returned by large-scale Web search engines. Unfortunately, the computational cost of scoring thousands of candidate ...
Read More
Efficient execution of graph algorithms on CPU with SIMD extensions
CGO '21: Proceedings of the 2021 IEEE/ACM International Symposium on Code Generation and Optimization

Existing state-of-the-art CPU graph frameworks take advantage of multiple cores, but not the SIMD capability within each core. In this work, we retarget an existing GPU graph algorithm compiler to obtain the first graph framework that uses SIMD ...
Read More
Efficient aerial image simulation on multi-core SIMD CPU
ICCAD '13: Proceedings of the International Conference on Computer-Aided Design

Aerial image simulation is a fundamental problem in advanced lithography for chip fabrication. Since it requires a huge number of mathematical computations, an efficient yet accurate implementation becomes a necessity. In the literature, GPU or FPGA has ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval
July 2016
1296 pages
ISBN:9781450340694
DOI:10.1145/2911451
General Chairs:
Raffaele Perego
ISTI-CNR, Italy
,
Fabrizio Sebastiani
Qatar Computing Research Institute, HBKU, Qatar
,
Program Chairs:
Javed Aslam
Northeastern University, US
,
Ian Ruthven
University of Strathclyde, UK
,
Justin Zobel
University of Melbourne, Australia
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 7 July 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
document scoring
ensemble methods
learning to rank
Qualifiers
- short-paper
Conference

Acceptance Rates
SIGIR '16 Paper Acceptance Rate62of341submissions,18%Overall Acceptance Rate792of3,983submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 21
  Total Citations
  View Citations
- 266
  Total Downloads
- Downloads (Last 12 months)14
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Exploiting CPU SIMD Extensions to Speed-up Document Scoring with Tree Ensembles

SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Fast Ranking with Additive Ensembles of Oblivious and Non-Oblivious Regression Trees

Efficient execution of graph algorithms on CPU with SIMD extensions

Efficient aerial image simulation on multi-core SIMD CPU