Peacock: Learning Long-Tail Topic Features for Industrial Applications

Published: 15 July 2015


Latent Dirichlet allocation (LDA) is a popular topic modeling technique in academia but less so in industry, especially in large-scale applications involving search engine and online advertising systems. A main underlying reason is that the topic models used have been too small in scale to be useful; for example, some of the largest LDA models reported in literature have up to 103 topics, which difficultly cover the long-tail semantic word sets. In this article, we show that the number of topics is a key factor that can significantly boost the utility of topic-modeling systems. In particular, we show that a “big” LDA model with at least 105 topics inferred from 109 search queries can achieve a significant improvement on industrial search engine and online advertising systems, both of which serve hundreds of millions of users. We develop a novel distributed system called Peacock to learn big LDA models from big data. The main features of Peacock include hierarchical distributed architecture, real-time prediction, and topic de-duplication. We empirically demonstrate that the Peacock system is capable of providing significant benefits via highly scalable LDA topic models for several industrial applications.


    Published In

    Volume 6, Issue 4
    Regular Papers and Special Section on Intelligent Healthcare Informatics
    August 2015
    Publication History

    Published: 15 July 2015
    Accepted: 01 December 2014
    Revised: 01 October 2014
    Received: 01 May 2014
    Published in TIST Volume 6, Issue 4


    Author Tags

    1. Latent Dirichlet allocation
    2. big data
    3. big topic models
    4. long-tail topic features
    5. online advertising systems
    6. search engine


    • Innovative Research Team in Soochow University
    • Natural Science Foundation of the Jiangsu Higher Education Institutions of China
    • National Grant Fundamental Research (973 Program) of China
    • National Natural Science Foundation of China


