Abstract
Latent Dirichlet allocation (LDA) is a popular topic modeling method which has found many multimedia applications, such as motion analysis and image categorization. Communication cost is one of the main bottlenecks for large-scale parallel learning of LDA. To reduce communication cost, we introduce Zipf’s law and propose novel parallel LDA algorithms that communicate only partial important information at each learning iteration. The proposed algorithms are much more efficient than the current state-of-the-art algorithms in both communication and computation costs. Extensive experiments on large-scale data sets demonstrate that our algorithms can greatly reduce communication and computation costs to achieve a better scalability.
Similar content being viewed by others
References
Ahmed A, Aly M, Gonzalez J, Narayanamurthy S, Smola A (2012) Scalable inference in latent variable models. In: WSDM, pp 123–132
Blei DM (2012) Introduction to probabilistic topic models. Commun ACM 55(4): 77–84
Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022
Canini KR, Shi L, Griffths TL (2009) Online inference of topics with latent Dirichlet allocation. In: AISTATS, pp 65–72
Griffiths TL, Steyvers M (2004) Finding scientific topics. Proc Natl Acad Sci 101:5228–5235
Hoffman M, Blei D, Bach F (2010) Online learning for latent Dirichlet allocation. In: NIPS, pp 856–864
Liu Z, Zhang Y, Chang E, Sun M (2011) Plda+: parallel latent Dirichlet allocation with data placement and pipeline processing. ACM Trans Intell Syst Technol 2(3):26:1–26:18
Newman D, Asuncion A, Smyth P, Welling M (2009) Distributed algorithms for topic models. J Mach Learn Res 10:1801–1828
Smola A, Narayanamurthy S (2010) An architecture for parallel topic models. In: PVLDB, pp 703–710
Wang Y, Bai H, Stanton M, Chen WY, Chang E (2009) Plda: parallel latent Dirichlet allocation for large-scale applications. In: Algorithmic aspects in information and management, pp 301–314
Winn J, Bishop CM (2005) Variational message passing. J Mach Learn Res 6:661–694
Zeng J (2012) A topic modeling toolbox using belief propagation. J Mach Learn Res 13:2233–2236
Zeng J, Cao XQ, Liu ZQ (2012) Residual belief propagation for topic modeling. In: ADMA, pp 739–752
Zeng J, Cheung WK, Liu J (2013) Learning topic models by belief propagation. IEEE Trans Pattern Anal Mach Intell 33(5):1121–1134
Zeng J, Liu ZQ, Cao XQ (2012) A new approach to speeding up topic modeling. arXiv:1204.0170 [cs.LG]
Zeng J, Liu ZQ, Cao XQ (2012) Online belief propagation for topic modeling. arXiv:1210.2179
Zhai K, Boyd-Graber J, Asadi N (2011) Using variational inference and MapReduce to scale topic modeling. arXiv:1107.3765v1 [cs.AI]
Zipf GK (1949) Human behavior and the principle of least effort. Addison-Wesley, Cambridge
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by L. Xie.
This work is supported by NSFC (Grant Nos. 61272449, 61202029, 61003154, 61373092 and 61033013), Guangdong Province Key Laboratory Project (Grant No. SZU-GDPHPCL-2012-09), Jiangsu Higher Education Institutions of China (Grant No. 12KJA520004), and a GRF grant from RGC UGC Hong Kong (GRF Project No. 9041574), grants from City University of Hong Kong [CityU Project No. 9041574 (CityU 118810) and 9041905 (CityU 119313)] to ZQL.
Rights and permissions
About this article
Cite this article
Yan, JF., Zeng, J., Gao, Y. et al. Communication-efficient algorithms for parallel latent Dirichlet allocation. Soft Comput 19, 3–11 (2015). https://doi.org/10.1007/s00500-014-1376-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-014-1376-8