Skip to main content

GLDA: Parallel Gibbs Sampling for Latent Dirichlet Allocation on GPU

  • Conference paper
  • First Online:

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 626))

Abstract

With the development of the general computing ability of GPU, more and more algorithms are being run on GPU, to enjoy much higher speed. In this paper, we propose an approach that uniformly accelerate Gibbs Sampling for LDA (Latent Dirichlet Allocation) algorithm on GPU, which makes the data load to the cores of GPU evenly to avoid the idle waiting for GPU, and improves the utilization of GPU. We use three text mining datasets to test the algorithm. Experiments show that our parallel methods can achieve about 30x speedup over sequential training methods with similar prediction precision. Furthermore, the idea that uniformly partitioning the data bases on GPU can also be applied to other machine learning algorithms.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Nvidia cuda. http://www.nvidia.com/cuda

  2. Aila, T., Laine, S.: Understanding the efficiency of ray traversal on GPUs. In: Proceedings of the Conference on High Performance Graphics 2009, pp. 145–149. ACM (2009)

    Google Scholar 

  3. Blei, D.M.: Introduction to probabilistic topicmodels. http://www.cs.princeton.edu/blei/papers/Blei2011.pdf

  4. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  5. Chen, W.Y., Chu, J.C., Luan, J., Bai, H., Wang, Y., Chang, E.Y.: Collaborative filtering for orkut communities: discovery of user latent behavior. In: Proceedings of the 18th international conference on World wide web, pp. 681–690. ACM (2009)

    Google Scholar 

  6. Cook, S.: CUDA programming: a developer’s guide to parallel computing with GPUs. Newnes (2012)

    Google Scholar 

  7. Wu, E., Liu, Y.: General calculation based on graphics processing unit (in Chinese). J. Comput. Aided Des. Comput. Graph. 16(5), 601–612 (2004)

    Google Scholar 

  8. Zhang, H., Li, L., Lan, L.: Research on the application of the general calculation of GPU (in Chinese). Comput. Digit. Eng. 33(12), 60–62 (2005)

    Google Scholar 

  9. Leischner, N., Osipov, V., Sanders, P.: GPU sample sort. In: 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), pp. 1–10. IEEE (2010)

    Google Scholar 

  10. Li, T., Liu, X., Dong, Q., Ma, W., Wang, K.: HPSVM: Heterogeneous parallel SVM with factorization based ipm algorithm on CPU-GPU cluster. In: 2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP), pp. 74–81. IEEE (2016)

    Google Scholar 

  11. Li, T., Wang, D., Zhang, S., Yang, Y.: Parallel rank coherence in networks for inferring disease phenotype and gene set associations. In: Wu, J., Chen, H., Wang, X. (eds.) ACA 2014. CCIS, vol. 451, pp. 163–176. Springer, Heidelberg (2014)

    Google Scholar 

  12. Liu, X., Zeng, J., Yang, X., Yan, J., Yang, Q.: Scalable parallel em algorithms for latent dirichlet allocation in multi-core systems. In: Proceedings of the 24th International Conference on World Wide Web, pp. 669–679. International World Wide Web Conferences Steering Committee (2015)

    Google Scholar 

  13. Liu, Z., Zhang, Y., Chang, E.Y., Sun, M.: Plda+: parallel latent dirichlet allocation with data placement and pipeline processing. ACM Trans. Intell. Syst. Technol. (TIST) 2(3), 26 (2011)

    Google Scholar 

  14. Masada, T., Hamada, T., Shibata, Y., Oguri, K.: Accelerating collapsed variational Bayesian inference for latent dirichlet allocation with nvidia CUDA compatible devices. In: Chien, B.C., Hong, T.P., Chen, S.M., Ali, M. (eds.) IEA/AIE 2009. LNCS, vol. 5579, pp. 491–500. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  15. Nallapati, R.M., Ahmed, A., Xing, E.P., Cohen, W.W.: Joint latent topic models for text and citations. In: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 542–550. ACM (2008)

    Google Scholar 

  16. Newman, D., Smyth, P., Welling, M., Asuncion, A.U.: Distributed inference for latent dirichlet allocation. In: Advances in Neural Information Processing Systems, pp. 1081–1088 (2007)

    Google Scholar 

  17. Smyth, P., Welling, M., Asuncion, A.U.: Asynchronous distributed learning of topic models. In: Advances in Neural Information Processing Systems. pp. 81–88 (2009)

    Google Scholar 

  18. Tang, J., Huo, R., Yao, J.: Evaluation of stability and similarity of latent dirichlet allocation. In: Software Engineering (WCSE), 2013 Fourth World Congress on. pp. 78–83. IEEE (2013)

    Google Scholar 

  19. Tora, S., Eguchi, K.: Mpi/openmp hybrid parallel inference for latent dirichlet allocation. In: Proceedings of the Third Workshop on Large Scale Data Mining: Theory and Applications. pp. 5. ACM (2011)

    Google Scholar 

  20. Wang, Y., Bai, H., Stanton, M., Chen, W.Y., Chang, E.Y.: PLDA: Parallel Latent Dirichlet Allocation for Large-Scale Applications. In: Goldberg, A.V., Zhou, Y. (eds.) AAIM 2009. LNCS, vol. 5564, pp. 301–314. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  21. Yan, F., Xu, N., Qi, Y.: Parallel inference for latent dirichlet allocation on graphics processing units. In: Advances in Neural Information Processing Systems. pp. 2134–2142 (2009)

    Google Scholar 

  22. Yan, J.F., Zeng, J., Gao, Y., Liu, Z.Q.: Communication-efficient algorithms for parallel latent dirichlet allocation. Soft Computing 19(1), 3–11 (2015)

    Article  Google Scholar 

  23. Zhang, S., Li, T., Dong, Q., Liu, X., Yang, Y.: Cpu-assisted gpu thread pool model for dynamic task parallelism. In: Networking, Architecture and Storage (NAS), 2015 IEEE International Conference on. pp. 135–140. IEEE (2015)

    Google Scholar 

Download references

Acknowledgments

This work is supported by the natural science fund of Tianjin City No. 16JCYBJC15200, the Open Project Fund of State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences No. CARCH201504, the special Research Fund for the Doctoral program of Higher Education No. 20130031120029, and the Open Fund of provincial and ministerial level scientific research institutions, Civil Aviation University of China No. CAAC-ISECCA-201502.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tao Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer Science+Business Media Singapore

About this paper

Cite this paper

Xue, P., Li, T., Zhao, K., Dong, Q., Ma, W. (2016). GLDA: Parallel Gibbs Sampling for Latent Dirichlet Allocation on GPU. In: Wu, J., Li, L. (eds) Advanced Computer Architecture. ACA 2016. Communications in Computer and Information Science, vol 626. Springer, Singapore. https://doi.org/10.1007/978-981-10-2209-8_9

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-2209-8_9

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-2208-1

  • Online ISBN: 978-981-10-2209-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics