Skip to main content
Log in

Communication-efficient algorithms for parallel latent Dirichlet allocation

  • Focus
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Latent Dirichlet allocation (LDA) is a popular topic modeling method which has found many multimedia applications, such as motion analysis and image categorization. Communication cost is one of the main bottlenecks for large-scale parallel learning of LDA. To reduce communication cost, we introduce Zipf’s law and propose novel parallel LDA algorithms that communicate only partial important information at each learning iteration. The proposed algorithms are much more efficient than the current state-of-the-art algorithms in both communication and computation costs. Extensive experiments on large-scale data sets demonstrate that our algorithms can greatly reduce communication and computation costs to achieve a better scalability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. http://www.mpich.org/.

  2. http://archive.ics.uci.edu/ml/datasets/.

  3. http://en.wikipedia.org.

References

  • Ahmed A, Aly M, Gonzalez J, Narayanamurthy S, Smola A (2012) Scalable inference in latent variable models. In: WSDM, pp 123–132

  • Blei DM (2012) Introduction to probabilistic topic models. Commun ACM 55(4): 77–84

  • Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022

    MATH  Google Scholar 

  • Canini KR, Shi L, Griffths TL (2009) Online inference of topics with latent Dirichlet allocation. In: AISTATS, pp 65–72

  • Griffiths TL, Steyvers M (2004) Finding scientific topics. Proc Natl Acad Sci 101:5228–5235

    Article  Google Scholar 

  • Hoffman M, Blei D, Bach F (2010) Online learning for latent Dirichlet allocation. In: NIPS, pp 856–864

  • Liu Z, Zhang Y, Chang E, Sun M (2011) Plda+: parallel latent Dirichlet allocation with data placement and pipeline processing. ACM Trans Intell Syst Technol 2(3):26:1–26:18

  • Newman D, Asuncion A, Smyth P, Welling M (2009) Distributed algorithms for topic models. J Mach Learn Res 10:1801–1828

    MATH  MathSciNet  Google Scholar 

  • Smola A, Narayanamurthy S (2010) An architecture for parallel topic models. In: PVLDB, pp 703–710

  • Wang Y, Bai H, Stanton M, Chen WY, Chang E (2009) Plda: parallel latent Dirichlet allocation for large-scale applications. In: Algorithmic aspects in information and management, pp 301–314

  • Winn J, Bishop CM (2005) Variational message passing. J Mach Learn Res 6:661–694

    MATH  MathSciNet  Google Scholar 

  • Zeng J (2012) A topic modeling toolbox using belief propagation. J Mach Learn Res 13:2233–2236

    MATH  MathSciNet  Google Scholar 

  • Zeng J, Cao XQ, Liu ZQ (2012) Residual belief propagation for topic modeling. In: ADMA, pp 739–752

  • Zeng J, Cheung WK, Liu J (2013) Learning topic models by belief propagation. IEEE Trans Pattern Anal Mach Intell 33(5):1121–1134

    Article  Google Scholar 

  • Zeng J, Liu ZQ, Cao XQ (2012) A new approach to speeding up topic modeling. arXiv:1204.0170 [cs.LG]

  • Zeng J, Liu ZQ, Cao XQ (2012) Online belief propagation for topic modeling. arXiv:1210.2179

  • Zhai K, Boyd-Graber J, Asadi N (2011) Using variational inference and MapReduce to scale topic modeling. arXiv:1107.3765v1 [cs.AI]

  • Zipf GK (1949) Human behavior and the principle of least effort. Addison-Wesley, Cambridge

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jia Zeng.

Additional information

Communicated by L. Xie.

This work is supported by NSFC (Grant Nos. 61272449, 61202029, 61003154, 61373092 and 61033013), Guangdong Province Key Laboratory Project (Grant No. SZU-GDPHPCL-2012-09), Jiangsu Higher Education Institutions of China (Grant No. 12KJA520004), and a GRF grant from RGC UGC Hong Kong (GRF Project No. 9041574), grants from City University of Hong Kong [CityU Project No. 9041574 (CityU 118810) and 9041905 (CityU 119313)] to ZQL.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yan, JF., Zeng, J., Gao, Y. et al. Communication-efficient algorithms for parallel latent Dirichlet allocation. Soft Comput 19, 3–11 (2015). https://doi.org/10.1007/s00500-014-1376-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-014-1376-8

Keywords

Navigation