research-article

Using Butterfly-patterned Partial Sums to Draw from Discrete Distributions

Authors:

Guy L. Steele Jr.,

Jean-Baptiste TristanAuthors Info & Claims

ACM Transactions on Parallel Computing (TOPC), Volume 6, Issue 4

Article No.: 22, Pages 1 - 30

https://doi.org/10.1145/3365662

Published: 19 November 2019 Publication History

Abstract

We describe a simd technique for drawing values from multiple discrete distributions, such as sampling from the random variables of a mixture model, that avoids computing a complete table of partial sums of the relative probabilities. A table of alternate (“butterfly-patterned”) form is faster to compute, making better use of coalesced memory accesses; from this table, complete partial sums are computed on the fly during a binary search. Measurements using cuda 7.5 on an nvidia Titan Black gpu show that this technique makes an entire machine-learning application that uses a Latent Dirichlet Allocation topic model with 1,024 topics about 13% faster (when using single-precision floating-point data) or about 35% faster (when using double-precision floating-point data) than doing a straightforward matrix transposition after using coalesced accesses.

References

[1]

Amr Ahmed, Linagjie Hong, and Alexander J. Smola. 2015. Nested Chinese restaurant franchise processes: Applications to user tracking and document modeling. In Proceedings of the 30th International Conference on Machine Learning (ICML’13). Microtome Publishing, Brookline, MA, 1426--1434. Retrieved from http://www.jmlr.org/proceedings/papers/v28/ahmed13.pdf.

[2]

Christopher M. Bishop. 2006. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag, New York.

Digital Library

[3]

David M. Blei. 2012. Probabilistic topic models. Commun. ACM 55, 4 (Apr. 2012), 77--84.

Digital Library

[4]

David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet allocation. J. Mach. Learn. Res. 3 (Mar. 2003), 993--1022. Retrieved from http://dl.acm.org/citation.cfm?id=944919.944937.

[5]

Yuri Dotsenko, Naga K. Govindaraju, Peter-Pike Sloan, Charles Boyd, and John Manferdelli. 2008. Fast scan algorithms on graphics processors. In Proceedings of the 22nd Annual International Conference on Supercomputing (ICS ’08). ACM, New York, 205--213.

Digital Library

[6]

Peter M. Fenwick. 1994. A new data structure for cumulative frequency tables. Software Pract. Exper. 24, 3 (1994), 327--336.

Digital Library

[7]

Thomas L. Griffiths and Mark Steyvers. 2004. Finding scientific topics. Proc. Natl. Acad. Sci. U.S.A. 101, suppl. 1 (2004), 5228--5235.

[8]

Diane Hu, Rob Hall, and Josh Attenberg. 2014. Style in the long tail: Discovering unique interests with latent variable models in large scale social E-commerce. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’14). ACM, New York, 1640--1649.

Digital Library

[9]

D. A. Huffman. 1952. A method for the construction of minimum-redundancy codes. Proc. IRE 40, 9 (Sept. 1952), 1098--1101.

[10]

S. Lennart Johnsson, Tim Harris, and Kapil K. Mathur. 1989. Matrix multiplication on the connection machine. In Proceedings of the 1989 ACM/IEEE Conference on Supercomputing. ACM, New York, NY, 326--332. http://doi.acm.org/10.1145/76263.76298

[11]

Joon Hee Kim, Amin Mantrach, Alejandro Jaimes, and Alice Oh. 2016. How to compete online for news audience: Modeling words that attract clicks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’16). ACM, New York, 1645--1654.

Digital Library

[12]

Donald E. Knuth. 1998. Seminumerical Algorithms (3rd edition). The Art of Computer Programming, Vol. 2. Addison-Wesley, Reading, MA.

Digital Library

[13]

Donald E. Knuth. 1998. Sorting and Searching (2nd edition). The Art of Computer Programming, Vol. 3. Addison-Wesley, Reading, MA.

Digital Library

[14]

Anthony Lee, Christopher Yau, Michael B. Giles, Arnaud Doucet, and Christopher C. Holmes. 2010. On the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods. J. Comput. Graph. Stat. 19, 4 (2010), 769--789. http://arxiv.org/pdf/0905.2441.pdf.

[15]

Aaron Q. Li, Amr Ahmed, Sujith Ravi, and Alexander J. Smola. 2014. Reducing the sampling complexity of topic models. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’14). ACM, New York, 891--900.

[16]

Mian Lu, Ge Bai, Qiong Luo, Jie Tang, and Jiuxin Zhao. 2013. Accelerating topic model training on a single machine. In Web Technologies and Applications (APWeb 2013), Yoshiharu Ishikawa, Jianzhong Li, Wei Wang, Rui Zhang, and Wenjie Zhang (Eds.). Lecture Notes in Computer Science, Vol. 7808. Springer, Berlin, 184--195.

[17]

Sepideh Maleki, Annie Yang, and Martin Burtscher. 2016. Higher-order and tuple-based massively-parallel prefix sums. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’16). ACM, New York, 539--552.

Digital Library

[18]

G. Marsaglia. 1963. Generating discrete random variables in a computer. Commun. ACM 6, 1 (Jan. 1963), 37--38.

Digital Library

[19]

Yossi Matias, Jeffrey Scott Vitter, and Wen-Chun Ni. 1993. Dynamic generation of discrete random variates. In Proceedings of the 4th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’93). Society for Industrial and Applied Mathematics, Philadelphia, PA, 361--370. Retrieved from http://dl.acm.org/citation.cfm?id=313559.313807.

[20]

NVIDIA. 2015. Developer Zone website: CUDA Toolkit documentation: CUDA Toolkit v6.5 Programming Guide, section B.14. Warp shuffle functions. Retrieved from http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#warp-shuffle-functions.

[21]

Daniel Ramage, Susan Dumais, and Dan Liebling. 2010. Characterizing microblogs with topic models. In Proceedings of the 4th International AAAI Conference on Weblogs and Social Media. Association for the Advancement of Artificial Intelligence, Palo Alto, CA, 130--137.

[22]

Guy L. Steele Jr. 2016. Using Butterfly-Patterned Partial Sums to Draw from Discrete Distributions. GTC website. Retrieved from http://on-demand.gputechconf.com/gtc/2016/video/s6665-guy-steele-fast-splittable.mp4.

[23]

Guy L. Steele Jr. 2016. Using butterfly-patterned partial sums to draw from discrete distributions. In NVIDIA GPU Technology Conference. Retrieved from http://on-demand.gputechconf.com/gtc/2016/presentation/s6666-guy-steele-butterfly-pattern.pdf. Slides for talk S6665. Video available at Reference [22].

[24]

Guy L. Steele Jr. and Jean-Baptiste Tristan. 2015. Using butterfly-patterned partial sums to optimize GPU memory accesses for drawing from discrete distributions. CoRR (Computing Research Repository at arXiv.org) (May 2015). Retrieved from http://arxiv.org/abs/1505.03851.

[25]

Guy L. Steele Jr. and Jean-Baptiste Tristan. 2017. Using butterfly-patterned partial sums to draw from discrete distributions. In Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’17). ACM, New York, 341--355. An early version of this paper is Reference [24].

[26]

Marc A. Suchard, Quanli Wang, Cliburn Chan, Jacob Frelinger, Andrew Cron, and Mike West. 2010. Understanding GPU programming for statistical computation: Studies in massively parallel massive mixtures. J. Comput. Graphic. Stat. 19, 2 (2010), 419--438.

[27]

Jean-Baptiste Tristan, Daniel Huang, Joseph Tassarotti, Adam C. Pocock, Stephen Green, and Guy L. Steele Jr. 2014. Augur: Data-parallel probabilistic modeling. In Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger (Eds.). Curran Associates, 2600--2608. Retrieved from http://papers.nips.cc/book/year-2014.

[28]

Jean-Baptiste Tristan, Joseph Tassarotti, and Guy L. Steele Jr. 2015. Efficient training of LDA on a GPU by mean-for-mode estimation. In Proceedings of the 32nd International Conference on Machine Learning (ICML’15). Microtome Publishing, Brookline, MA, 59--68. Retrieved from http://jmlr.org/proceedings/papers/v37/tristan15.pdf.

[29]

M. D. Vose. 1991. A linear algorithm for generating random numbers with a given distribution. IEEE Trans. Software Engineer. 17, 9 (Sept. 1991), 972--975.

Digital Library

[30]

A. J. Walker. 1974. New fast method for generating discrete random numbers with arbitrary frequency distributions. Electron. Lett. 10, 8 (Apr. 1974), 127--128.

[31]

Alastair J. Walker. 1977. An efficient method for generating discrete random variables with general distributions. ACM Trans. Math. Software 3, 3 (Sept. 1977), 253--256.

Digital Library

[32]

Nicholas Wilt. 2013. The CUDA Handbook: A Comprehensive Guide to GPU Programming. Addison-Wesley, Upper Saddle River, NJ.

[33]

Feng Yan, Ningyi Xu, and Yuan Qi. 2009. Parallel inference for latent Dirichlet allocation on graphics processing units. In Advances in Neural Information Processing Systems 22. Curran Associates, 2134--2142. Retrieved from http://papers.nips.cc/book/year-2009.

[34]

Shengen Yan, Guoping Long, and Yunquan Zhang. 2013. StreamScan: Fast scan algorithms for GPUs without global barrier synchronization. In Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’13). ACM, New York, 229--238.

Digital Library

[35]

Huasha Zhao, Biye Jiang, and John Canny. 2014. SAME but different: Fast and high-quality Gibbs parameter estimation. CoRR (Computing Research Repository at arXiv.org) (Sept. 2014). Retrieved from http://arxiv.org/abs/1409.5402.

[36]

Seth Zimmerman. 1959. An optimal search procedure. Amer. Math. Monthly 66, 8 (Oct. 1959), 690--693.

Index Terms

Using Butterfly-patterned Partial Sums to Draw from Discrete Distributions

Recommendations

Using Butterfly-Patterned Partial Sums to Draw from Discrete Distributions
PPoPP '17: Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

We describe a SIMD technique for drawing values from multiple discrete distributions, such as sampling from the random variables of a mixture model, that avoids computing a complete table of partial sums of the relative probabilities. A table of ...
Using Butterfly-Patterned Partial Sums to Draw from Discrete Distributions
PPoPP '17

We describe a SIMD technique for drawing values from multiple discrete distributions, such as sampling from the random variables of a mixture model, that avoids computing a complete table of partial sums of the relative probabilities. A table of ...
Accelerating computation of Euclidean distance map using the GPU with efficient memory access

Recent graphics processing units GPUs, which have many processing units, can be used for general purpose parallel computation. To utilise the powerful computing ability, GPUs are widely used for general purpose processing. Since GPUs have very high ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Parallel Computing

ACM Transactions on Parallel Computing Volume 6, Issue 4

December 2019

188 pages

ISSN:2329-4949

EISSN:2329-4957

DOI:10.1145/3372747

Editor:
David A. Bader
New Jersey Institute of Technology, USA

Issue’s Table of Contents

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 November 2019

Accepted: 01 May 2019

Revised: 01 March 2019

Received: 01 August 2018

Published in TOPC Volume 6, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
128
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Issue’s Table of Contents