Abstract
Topic modeling becomes a popular research area which shows us new way to search, browse and summarize large amount of texts. Methods of topic modeling try to uncover the hidden thematic structure in document collections. Topic modeling in connection with social networks, which are one of the strongest communication tool and produces large amount of opinions and attitudes on world events, can be useful for analysis in case of crisis situations, elections, launching a new product on the market etc. For that reason we pro-pose a tool for topic modeling over text streams from social networks in this paper. Description of proposed tool is extended with practical experiments. Realized experiments shown promising results when using our tool on real data in comparison to state-of-the-art methods.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Blei, D.: Probabilistic topic models. Commun. ACM 55(4), 77–84 (2012)
Xie, P., Xing, E.: Integrating document clustering and topic modeling. In: Proceedings of 29th Conference Uncertainty in Artificial Intelligence, Bellevue, US, pp. 694–703 (2013)
Landauer, T.K., Foltz, P.W., Laham, D.: An introduction to latent semantic analysis. Discourse Process 25(2–3), 259–284 (1998)
Hofmann, T.: Probabilistic latent semantic analysis. In: Proceedings of 15th Conference Uncertainty in Artificial Intelligence, Stockholm, Sweden, pp. 289–296 (1999)
Blei, D., Ng, A., Jordan, M.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 694–703 (2003)
Petterson, J., Buntine, W., Narayanamurthy, S., Caetano, T., Smola, A.: Word features for latent Dirichlet allocation. Adv. Neural Inf. Process. Syst. 23, 1921–1929 (2010)
Zhai, K., Boyd-Graber, J.: Online latent Dirichlet allocation with infine vocabulary. In: Proceedings of 30th International Conference on Machine Learning, Atlanta, US, pp. 561–569 (2013)
Teh, Y., Jordan, M., Beal, M., Blei, D.: Hierarchical Dirichlet processes. J. Am. Stat. Assoc. 101(476), 1566–1581 (2006)
Li, X., Ouyang, J., Lu, Y.: Topic modeling for large-scale text data. Front. Electr. Electron. Eng. 16(6), 457–465 (2015)
Hoffman, M., Blei, D., Wang, C., Paisley, D.: Stochastic variational inference. J. Mach. Learn. Res. 14, 1303–1347 (2013)
Phan, X., Nguyen, L., Horiguchi, S.: Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In: Proceedings of 17th International Conference on World Wide Web, Beijing, China, pp. 91–99 (2008)
Sridhar, V.: Unsupervised topic modeling for short texts using distributed representations of words. In: Proceedings of NAACL-HLT 2015, Denver, US, pp. 192–200 (2015)
Cheng, S., Yan, X., Lan, Y., Guo, J.: BTM - topic modeling over short texts. IEEE Trans. Knowl. Data Eng. 26(12), 2928–2941 (2014)
Quan, X., Kit, C., Ge, Y., Pan, S.: Short and sparse text topic modeling via self-aggregation. In: Proceedings of 24th International Conference on Artificial Intelligence, Buenos Aires, Argentina, pp. 2270–2276 (2015)
Blondel, V., Guillaume, J., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech. Theory Exp. 2008(10), P10008 (2008). (pp. 1–12)
Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: bringing order to the Web. Technical report, Stanford Digital Libraries (1998)
Yang, Y., Pedersen, J.: A comparative study of feature selection in text categorizations. In: Proceedings of 14th International Conference on Machine Learning, San Francisco, US, pp. 412–420 (1997)
Manning, C., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)
Pocs, J., Pocsova, J.: Basic theorem as representation of heterogeneous concept lattices. Front. Comput. Sci. 9(4), 636–642 (2015)
Pocs, J., Pocsova, J.: Bipolarized extension of heterogeneous concept lattices. Appl. Math. Sci. 8(125–128), 6359–6365 (2014)
Sarnovsky, M., Carnoka, N.: Distributed algorithm for text documents clustering based on k-means approach. Adv. Intell. Syst. Comput. 430, 165–174 (2016)
Acknowledgments
The work presented in this paper was supported by the Slovak VEGA grant 1/0493/16 and Slovak KEGA grant 025TUKE-4/2015.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Smatana, M., Paralič, J., Butka, P. (2016). Topic Modeling over Text Streams from Social Media. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech, and Dialogue. TSD 2016. Lecture Notes in Computer Science(), vol 9924. Springer, Cham. https://doi.org/10.1007/978-3-319-45510-5_19
Download citation
DOI: https://doi.org/10.1007/978-3-319-45510-5_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45509-9
Online ISBN: 978-3-319-45510-5
eBook Packages: Computer ScienceComputer Science (R0)