Improving Operational Intensity in Data Bound Markov Chain Monte Carlo

https://doi.org/10.1016/j.procs.2017.05.024Get rights and content
Under a Creative Commons license
open access

Abstract

Typically, parallel algorithms are developed to leverage the processing power of multiple processors simultaneously speeding up overall execution. At the same time, discrepancy between DRAM bandwidth and microprocessor speed hinders reaching peak performance. This paper explores how operational intensity improves by performing useful computation during otherwise stalled cycles. While the proposed methodology is applicable to a wide variety of parallel algorithms, and at different scales, the concepts are demonstrated in the machine learning context. Performance improvements are shown for Bayesian logistic regression with a Markov chain Monte Carlo sampler, either with multiple chains or with multiple proposals, on a dense data set two orders of magnitude larger than the last level cache on contemporary systems.

Keywords

Operational intensity
MCMC
Bayesian logistic regression
HPC
Big Data

Cited by (0)