Abstract
The rise of online social media has led to an explosion of metadata-containing user generated content. The tracking of metadata distribution is essential to understand social media. This paper presents two statistical models that detect interpretable topics over time along with their hashtags distribution. A topic is represented by a cluster of words that frequently occur together, and a context is represented by a cluster of hashtags, i.e., the hashtag distribution. The models combine a context with a related topic by jointly modeling words with hashtags and time. Experiments with real-world datasets demonstrate that the proposed models discover topics over time with related contexts effectively.
















Similar content being viewed by others
References
Ahmed, A., Ho, Q., Eisenstein, J., Xing, E., Smola, A.J., Teo, C.H.: Unified analysis of streaming news. In: Proceedings of the 20th International Conference on World Wide Web (WWW), pp. 267–276 (2011)
Alam, M.H., Lee, S.: Semantic aspect discovery for online reviews. In: Proceedings of the 12th IEEE International Conference on Data Mining (ICDM), pp. 816-821 (2012)
Alam, M.H., Ryu, W.J., Lee, S.: Context over time: Modeling context evolution in social media. In: Proceedings of the 3rd Workshop on Data-Driven User Behavioral Modeling and Mining from Social Media (DUBMOD), pp. 15–18 (2014)
AlSumait, L., Barbara, D., Domeniconi, C.: On-line lda: Adaptive topic models for mining text streams with applications to topic detection and tracking. In: Proceedings of the 8th IEEE International Conference on Data Mining (ICDM), pp. 3–12 (2008)
Blei, D.M., Lafferty, J.D.: Dynamic topic models. In: Proceedings of the 23rd International Conference on Machine Learning (ICML), pp. 113–120 (2006)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Bravo-Marquez, F., Mendoza, M., Poblete, B.: Meta-level sentiment models for big social data analysis. Knowl.-Based Syst. 69, 86–99 (2014)
Chua, F., Asur, S.: Automatic summarization of events from social media. In: Proceedings of the 7th International Conference on Weblogs and Social Media (ICWSM), pp. 81–90 (2013)
Dubey, A., Hefny, A., Williamson, S., Xing, E.P.: A nonparametric mixture model for topic modeling over time. In: Proceedings of the 13th SIAM International Conference on Data Mining, pp. 530– 538 (2013)
Flor, M.: Four types of context for automatic spelling correction. Traitement Automatique Langues (TAL) 53(3), 61–99 (2012)
He, Q., Chen, B., Pei, J., Qiu, B., Mitra, P., Giles, L.: Detecting topic evolution in scientific literature: How can citations help? In: Proceedings of the 18th ACM Conference on Information and Knowledge Management (CIKM), pp. 957–966 (2009)
Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Mach. Learn. 42, 177–196 (2001)
Katz, G., Ofek, N., Shapira, B.: ConSent: Context-based sentiment analysis. Knowl.-Based Syst. 84, 162–178 (2015)
Kawamae, N.: Trend analysis model: Trend consists of temporal words, topics, and timestamps. In: Proceedings of the 4th ACM International Conference on Web Search and Data Mining (WSDM), pp. 317–326 (2011)
Lau, J., Collier, N., Baldwin, T.: On-line trend analysis with topic models: #twitter trends detection topic model. In: Proceedings of the 24th International Conference on Computational Linguistics (COLING), pp. 1–16 (2012)
Li, J., Cardie, C.: Timeline generation: Tracking individuals on twitter. In: Proceedings of the 23rd International Conference on World Wide Web (WWW), pp. 643–652 (2014)
Lin, C., He, Y.: Joint sentiment/topic model for sentiment analysis. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management (CIKM), pp. 375–384 (2009)
Manning, C.D., Raghavan, P., Schutze, H.: Introduction to information retrieval. Cambridge University Press (2008)
McCallum, A., Wang, X., Corrada-Emmanuel, A.: Topic and role discovery in social networks with experiments on enron and academic email. J. Artif. Intell. Res. 30(1), 249–272 (2007)
Mehrotra, R., Sanner, S., Buntine, W., Xie, L.: Improving LDA topic models for microblogs via tweet pooling and automatic labeling. In: Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 889–892 (2013)
Mei, Q., Zhai, C.: Discovering evolutionary theme patterns from text: An exploration of temporal text mining. In: Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining (SIGKDD), pp. 198–207 (2005)
Montejo-Rez, A., Daz-Galiano, M.C., Martnez-Santiago, F., Urea-Lpez, L.A.: Crowd explicit sentiment analysis. Knowl.-Based Syst. 69, 134–139 (2014)
Qian, T., Li, Q., Liu, B., Xiong, H., Srivastava, J., Sheu, P.C.: Topic formation and development: A core-group evolving process. World Wide Web 17(6), 1343–1373 (2014)
Ramage, D., Hall, D., Nallapati, R., Manning, C.D.: Labeled lda: A supervised topic model for credit attribution in multi-labeled corpora. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 248–256 (2009)
Rao, Y., Lei, J., Wenyin, L., Li, Q., Chen, M.: Building emotional dictionary for sentiment analysis of online news. World Wide Web 17(4), 723–742 (2014)
Rosenthal, S., Nakov, P., Kiritchenko, S., Mohammad, S.M., Ritter, A., Stoyanov, V.: SemEval-2015 task 10: Sentiment analysis in twitter. In: Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval), pp. 451–463 (2015)
Rosen-Zvi, M., Griffiths, T., Steyvers, M., Smyth, P.: The author-topic model for authors and documents. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence (UAI), pp. 487–494 (2004)
Shuyo, N.: Language detection library for java. http://code.google.com/p/language-detection/ (2010)
Si, J., Li, Q., Qian, T., Deng, X.: Users’ interest grouping from online reviews based on topic frequency and order. World Wide Web 17(6), 1321–1342 (2014)
Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M.: Hierarchical dirichlet processes. J. Amer. Stat. Assoc. 101(476), 1566–1581 (2006)
Tang, J., Zhang, M., Mei, Q.: One theme in all views: Modeling consensus topics in multiple contexts. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD), pp. 5–13 (2013)
Tang, X., Yang, C.C.: TUT: A statistical model for detecting trends, topics and user interests in social media. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM), pp. 972–981 (2012)
Tao, K., Abel, F., Hauff, C., Houben, G.-J., Gadiraju, U.: Groundhog day: Near-duplicate detection on twitter. In: Proceedings of the 22nd International Conference on World Wide Web (WWW), pp. 1273–1284 (2013)
Wang, X., McCallum, A.: Topics over time: A non-markov continuous-time model of topical trends. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD), pp. 424–433 (2006)
Zhou, E., Zhong, N., Li, Y.: Extracting news blog hot topics based on the W2T methodology. World Wide Web 17(3), 377–404 (2014)
Acknowledgment
This research was supported by the Basic Science Research Program and the Next-Generation Information Computing Development Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT and Future Planning (numbers 2015R1A2A1A10052665, 2015R1A2A1A15052701 and 2012M3C4A7033344).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Alam, M.H., Ryu, WJ. & Lee, S. Hashtag-based topic evolution in social media. World Wide Web 20, 1527–1549 (2017). https://doi.org/10.1007/s11280-017-0451-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-017-0451-3