Skip to main content

Collapsed Gibbs Sampling of Beta-Liouville Multinomial for Short Text Clustering

  • Conference paper
  • First Online:
Advances and Trends in Artificial Intelligence. Artificial Intelligence Practices (IEA/AIE 2021)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12798))

  • 1636 Accesses

Abstract

With the rise of social media, we have access to more and more text data collected through platforms like Facebook and Twitter. The abundance of these data comes along with short texts challenges. We propose in this paper a collapsed Gibbs Sampling Beta-Liouville Multinomial (CGSBLM) to cope with those challenges. We evaluate the proposed CGSBLM on two datasets extracted from the Google News corpus. Apart from giving a better performance, our approach allows to address the limitations related to short text clustering.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Blei, D., Ng, A., Jordan, M.: Latent Dirichlet allocation. Adv. Neural. Inf. Process. Syst. 14, 601–608 (2001)

    MATH  Google Scholar 

  2. Bouguila, N.: Clustering of count data using generalized Dirichlet multinomial distributions. IEEE Trans. Knowl. Data Eng. 20(4), 462–474 (2008)

    Article  Google Scholar 

  3. Blei, D.M., Lafferty, J.D., et al.: A correlated topic model of science. Ann. Appl. Stat. 1(1), 17–35 (2007)

    Article  MathSciNet  Google Scholar 

  4. Putthividhya, D., Attias, H.T., Nagarajan, S.: Independent factor topic models. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 833–840 (2009)

    Google Scholar 

  5. Caballero, K.L., Barajas, J., Akella, R.: The generalized Dirichlet distribution in enhanced topic detection. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 773–782 (2012)

    Google Scholar 

  6. Bouguila, N.: Hybrid generative/discriminative approaches for proportional data modeling and classification. IEEE Trans. Knowl. Data Eng. 24(12), 2184–2202 (2011)

    Article  Google Scholar 

  7. Albalawi, R., Yeap, T.H., Benyoucef, M.: Using topic modeling methods for short-text data: a comparative analysis. Front. Artif. Intell. 3, 42 (2020)

    Article  Google Scholar 

  8. Kherwa, P., Bansal, P.: Topic modeling: lreview. EAI Endorsed Trans. Scalable Inf. Syst. 7(24) (2020)

    Google Scholar 

  9. Casella, G., George, E.I.: Explaining the Gibbs sampler. Am. Stat. 46(3), 167–174 (1992)

    MathSciNet  Google Scholar 

  10. Yin, J., Wang, J.: A Dirichlet multinomial mixture model-based approach for short text clustering. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 233–242 (2014)

    Google Scholar 

  11. Ratnaparkhi, M.V.: Multinomial distribution: properties and extensions. Wiley StatsRef: Statistics Reference Online (2014)

    Google Scholar 

  12. Zamzami, N., Bougila, N.: High-dimensional count data clustering based on an exponential approximation to the multinomial beta-liouville distribution. Inf. Sci. 524, 116–135 (2020)

    Article  MathSciNet  Google Scholar 

  13. Bouguila, N.: Count data modeling and classification using finite mixtures of distributions. IEEE Trans. Neural Netw. 22(2), 186–198 (2010)

    Article  Google Scholar 

  14. Heinrich, G.: Parameter estimation for text analysis. Technical report (2005)

    Google Scholar 

  15. Banerjee, S., Ramanathan, K., Gupta, A.: Clustering short texts using Wikipedia. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 787–788 (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hannachi, S., Najar, F., Ihou, K.E., Bouguila, N. (2021). Collapsed Gibbs Sampling of Beta-Liouville Multinomial for Short Text Clustering. In: Fujita, H., Selamat, A., Lin, J.CW., Ali, M. (eds) Advances and Trends in Artificial Intelligence. Artificial Intelligence Practices. IEA/AIE 2021. Lecture Notes in Computer Science(), vol 12798. Springer, Cham. https://doi.org/10.1007/978-3-030-79457-6_48

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-79457-6_48

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-79456-9

  • Online ISBN: 978-3-030-79457-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics