Abstract
With the rise of social media, we have access to more and more text data collected through platforms like Facebook and Twitter. The abundance of these data comes along with short texts challenges. We propose in this paper a collapsed Gibbs Sampling Beta-Liouville Multinomial (CGSBLM) to cope with those challenges. We evaluate the proposed CGSBLM on two datasets extracted from the Google News corpus. Apart from giving a better performance, our approach allows to address the limitations related to short text clustering.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Blei, D., Ng, A., Jordan, M.: Latent Dirichlet allocation. Adv. Neural. Inf. Process. Syst. 14, 601–608 (2001)
Bouguila, N.: Clustering of count data using generalized Dirichlet multinomial distributions. IEEE Trans. Knowl. Data Eng. 20(4), 462–474 (2008)
Blei, D.M., Lafferty, J.D., et al.: A correlated topic model of science. Ann. Appl. Stat. 1(1), 17–35 (2007)
Putthividhya, D., Attias, H.T., Nagarajan, S.: Independent factor topic models. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 833–840 (2009)
Caballero, K.L., Barajas, J., Akella, R.: The generalized Dirichlet distribution in enhanced topic detection. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 773–782 (2012)
Bouguila, N.: Hybrid generative/discriminative approaches for proportional data modeling and classification. IEEE Trans. Knowl. Data Eng. 24(12), 2184–2202 (2011)
Albalawi, R., Yeap, T.H., Benyoucef, M.: Using topic modeling methods for short-text data: a comparative analysis. Front. Artif. Intell. 3, 42 (2020)
Kherwa, P., Bansal, P.: Topic modeling: lreview. EAI Endorsed Trans. Scalable Inf. Syst. 7(24) (2020)
Casella, G., George, E.I.: Explaining the Gibbs sampler. Am. Stat. 46(3), 167–174 (1992)
Yin, J., Wang, J.: A Dirichlet multinomial mixture model-based approach for short text clustering. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 233–242 (2014)
Ratnaparkhi, M.V.: Multinomial distribution: properties and extensions. Wiley StatsRef: Statistics Reference Online (2014)
Zamzami, N., Bougila, N.: High-dimensional count data clustering based on an exponential approximation to the multinomial beta-liouville distribution. Inf. Sci. 524, 116–135 (2020)
Bouguila, N.: Count data modeling and classification using finite mixtures of distributions. IEEE Trans. Neural Netw. 22(2), 186–198 (2010)
Heinrich, G.: Parameter estimation for text analysis. Technical report (2005)
Banerjee, S., Ramanathan, K., Gupta, A.: Clustering short texts using Wikipedia. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 787–788 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Hannachi, S., Najar, F., Ihou, K.E., Bouguila, N. (2021). Collapsed Gibbs Sampling of Beta-Liouville Multinomial for Short Text Clustering. In: Fujita, H., Selamat, A., Lin, J.CW., Ali, M. (eds) Advances and Trends in Artificial Intelligence. Artificial Intelligence Practices. IEA/AIE 2021. Lecture Notes in Computer Science(), vol 12798. Springer, Cham. https://doi.org/10.1007/978-3-030-79457-6_48
Download citation
DOI: https://doi.org/10.1007/978-3-030-79457-6_48
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-79456-9
Online ISBN: 978-3-030-79457-6
eBook Packages: Computer ScienceComputer Science (R0)