Communication-efficient algorithms for parallel latent Dirichlet allocation

Yan, Jian-Feng; Zeng, Jia; Gao, Yang; Liu, Zhi-Qiang

doi:10.1007/s00500-014-1376-8

Communication-efficient algorithms for parallel latent Dirichlet allocation

Focus
Published: 18 July 2014

Volume 19, pages 3–11, (2015)
Cite this article

Soft Computing Aims and scope Submit manuscript

Jian-Feng Yan¹,
Jia Zeng¹,
Yang Gao¹ &
…
Zhi-Qiang Liu²

339 Accesses
2 Citations
Explore all metrics

Abstract

Latent Dirichlet allocation (LDA) is a popular topic modeling method which has found many multimedia applications, such as motion analysis and image categorization. Communication cost is one of the main bottlenecks for large-scale parallel learning of LDA. To reduce communication cost, we introduce Zipf’s law and propose novel parallel LDA algorithms that communicate only partial important information at each learning iteration. The proposed algorithms are much more efficient than the current state-of-the-art algorithms in both communication and computation costs. Extensive experiments on large-scale data sets demonstrate that our algorithms can greatly reduce communication and computation costs to achieve a better scalability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Comparative Study on Parallel LDA Algorithms in MapReduce Framework

Parallel Non-blocking Deterministic Algorithm for Online Topic Modeling

DistSNNMF: Solving Large-Scale Semantic Topic Model Problems on HPC for Streaming Texts

Notes

References

Ahmed A, Aly M, Gonzalez J, Narayanamurthy S, Smola A (2012) Scalable inference in latent variable models. In: WSDM, pp 123–132
Blei DM (2012) Introduction to probabilistic topic models. Commun ACM 55(4): 77–84
Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022
MATH Google Scholar
Canini KR, Shi L, Griffths TL (2009) Online inference of topics with latent Dirichlet allocation. In: AISTATS, pp 65–72
Griffiths TL, Steyvers M (2004) Finding scientific topics. Proc Natl Acad Sci 101:5228–5235
Article Google Scholar
Hoffman M, Blei D, Bach F (2010) Online learning for latent Dirichlet allocation. In: NIPS, pp 856–864
Liu Z, Zhang Y, Chang E, Sun M (2011) Plda+: parallel latent Dirichlet allocation with data placement and pipeline processing. ACM Trans Intell Syst Technol 2(3):26:1–26:18
Newman D, Asuncion A, Smyth P, Welling M (2009) Distributed algorithms for topic models. J Mach Learn Res 10:1801–1828
MATH MathSciNet Google Scholar
Smola A, Narayanamurthy S (2010) An architecture for parallel topic models. In: PVLDB, pp 703–710
Wang Y, Bai H, Stanton M, Chen WY, Chang E (2009) Plda: parallel latent Dirichlet allocation for large-scale applications. In: Algorithmic aspects in information and management, pp 301–314
Winn J, Bishop CM (2005) Variational message passing. J Mach Learn Res 6:661–694
MATH MathSciNet Google Scholar
Zeng J (2012) A topic modeling toolbox using belief propagation. J Mach Learn Res 13:2233–2236
MATH MathSciNet Google Scholar
Zeng J, Cao XQ, Liu ZQ (2012) Residual belief propagation for topic modeling. In: ADMA, pp 739–752
Zeng J, Cheung WK, Liu J (2013) Learning topic models by belief propagation. IEEE Trans Pattern Anal Mach Intell 33(5):1121–1134
Article Google Scholar
Zeng J, Liu ZQ, Cao XQ (2012) A new approach to speeding up topic modeling. arXiv:1204.0170 [cs.LG]
Zeng J, Liu ZQ, Cao XQ (2012) Online belief propagation for topic modeling. arXiv:1210.2179
Zhai K, Boyd-Graber J, Asadi N (2011) Using variational inference and MapReduce to scale topic modeling. arXiv:1107.3765v1 [cs.AI]
Zipf GK (1949) Human behavior and the principle of least effort. Addison-Wesley, Cambridge
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Technology, Soochow University, Suzhou, 215006, China
Jian-Feng Yan, Jia Zeng & Yang Gao
School of Creative Media, City University of Hong Kong, Hong Kong, China
Zhi-Qiang Liu

Authors

Jian-Feng Yan
View author publications
You can also search for this author in PubMed Google Scholar
Jia Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Yang Gao
View author publications
You can also search for this author in PubMed Google Scholar
Zhi-Qiang Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jia Zeng.

Additional information

Communicated by L. Xie.

This work is supported by NSFC (Grant Nos. 61272449, 61202029, 61003154, 61373092 and 61033013), Guangdong Province Key Laboratory Project (Grant No. SZU-GDPHPCL-2012-09), Jiangsu Higher Education Institutions of China (Grant No. 12KJA520004), and a GRF grant from RGC UGC Hong Kong (GRF Project No. 9041574), grants from City University of Hong Kong [CityU Project No. 9041574 (CityU 118810) and 9041905 (CityU 119313)] to ZQL.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yan, JF., Zeng, J., Gao, Y. et al. Communication-efficient algorithms for parallel latent Dirichlet allocation. Soft Comput 19, 3–11 (2015). https://doi.org/10.1007/s00500-014-1376-8

Download citation

Published: 18 July 2014
Issue Date: January 2015
DOI: https://doi.org/10.1007/s00500-014-1376-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Communication-efficient algorithms for parallel latent Dirichlet allocation

Abstract

Access this article

Similar content being viewed by others

A Comparative Study on Parallel LDA Algorithms in MapReduce Framework

Parallel Non-blocking Deterministic Algorithm for Online Topic Modeling

DistSNNMF: Solving Large-Scale Semantic Topic Model Problems on HPC for Streaming Texts

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Communication-efficient algorithms for parallel latent Dirichlet allocation

Abstract

Access this article

Similar content being viewed by others

A Comparative Study on Parallel LDA Algorithms in MapReduce Framework

Parallel Non-blocking Deterministic Algorithm for Online Topic Modeling

DistSNNMF: Solving Large-Scale Semantic Topic Model Problems on HPC for Streaming Texts

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation