Skip to main content
Log in

Hashtag-based topic evolution in social media

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

The rise of online social media has led to an explosion of metadata-containing user generated content. The tracking of metadata distribution is essential to understand social media. This paper presents two statistical models that detect interpretable topics over time along with their hashtags distribution. A topic is represented by a cluster of words that frequently occur together, and a context is represented by a cluster of hashtags, i.e., the hashtag distribution. The models combine a context with a related topic by jointly modeling words with hashtags and time. Experiments with real-world datasets demonstrate that the proposed models discover topics over time with related contexts effectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14
Figure 15
Figure 16

Similar content being viewed by others

Notes

  1. http://www.google.com/trends/

  2. https://twitter.com/

  3. https://foursquare.com/

  4. http://en.wikipedia.org/wiki/Sinking_of_the_MV_Sewol

  5. http://en.wikipedia.org/wiki/Nelson_Mandela

  6. http://hunspell.sourceforge.net/

References

  1. Ahmed, A., Ho, Q., Eisenstein, J., Xing, E., Smola, A.J., Teo, C.H.: Unified analysis of streaming news. In: Proceedings of the 20th International Conference on World Wide Web (WWW), pp. 267–276 (2011)

  2. Alam, M.H., Lee, S.: Semantic aspect discovery for online reviews. In: Proceedings of the 12th IEEE International Conference on Data Mining (ICDM), pp. 816-821 (2012)

  3. Alam, M.H., Ryu, W.J., Lee, S.: Context over time: Modeling context evolution in social media. In: Proceedings of the 3rd Workshop on Data-Driven User Behavioral Modeling and Mining from Social Media (DUBMOD), pp. 15–18 (2014)

  4. AlSumait, L., Barbara, D., Domeniconi, C.: On-line lda: Adaptive topic models for mining text streams with applications to topic detection and tracking. In: Proceedings of the 8th IEEE International Conference on Data Mining (ICDM), pp. 3–12 (2008)

  5. Blei, D.M., Lafferty, J.D.: Dynamic topic models. In: Proceedings of the 23rd International Conference on Machine Learning (ICML), pp. 113–120 (2006)

  6. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  7. Bravo-Marquez, F., Mendoza, M., Poblete, B.: Meta-level sentiment models for big social data analysis. Knowl.-Based Syst. 69, 86–99 (2014)

    Article  Google Scholar 

  8. Chua, F., Asur, S.: Automatic summarization of events from social media. In: Proceedings of the 7th International Conference on Weblogs and Social Media (ICWSM), pp. 81–90 (2013)

  9. Dubey, A., Hefny, A., Williamson, S., Xing, E.P.: A nonparametric mixture model for topic modeling over time. In: Proceedings of the 13th SIAM International Conference on Data Mining, pp. 530– 538 (2013)

  10. Flor, M.: Four types of context for automatic spelling correction. Traitement Automatique Langues (TAL) 53(3), 61–99 (2012)

    Google Scholar 

  11. He, Q., Chen, B., Pei, J., Qiu, B., Mitra, P., Giles, L.: Detecting topic evolution in scientific literature: How can citations help? In: Proceedings of the 18th ACM Conference on Information and Knowledge Management (CIKM), pp. 957–966 (2009)

  12. Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Mach. Learn. 42, 177–196 (2001)

    Article  MATH  Google Scholar 

  13. Katz, G., Ofek, N., Shapira, B.: ConSent: Context-based sentiment analysis. Knowl.-Based Syst. 84, 162–178 (2015)

    Article  Google Scholar 

  14. Kawamae, N.: Trend analysis model: Trend consists of temporal words, topics, and timestamps. In: Proceedings of the 4th ACM International Conference on Web Search and Data Mining (WSDM), pp. 317–326 (2011)

  15. Lau, J., Collier, N., Baldwin, T.: On-line trend analysis with topic models: #twitter trends detection topic model. In: Proceedings of the 24th International Conference on Computational Linguistics (COLING), pp. 1–16 (2012)

  16. Li, J., Cardie, C.: Timeline generation: Tracking individuals on twitter. In: Proceedings of the 23rd International Conference on World Wide Web (WWW), pp. 643–652 (2014)

  17. Lin, C., He, Y.: Joint sentiment/topic model for sentiment analysis. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management (CIKM), pp. 375–384 (2009)

  18. Manning, C.D., Raghavan, P., Schutze, H.: Introduction to information retrieval. Cambridge University Press (2008)

  19. McCallum, A., Wang, X., Corrada-Emmanuel, A.: Topic and role discovery in social networks with experiments on enron and academic email. J. Artif. Intell. Res. 30(1), 249–272 (2007)

    Google Scholar 

  20. Mehrotra, R., Sanner, S., Buntine, W., Xie, L.: Improving LDA topic models for microblogs via tweet pooling and automatic labeling. In: Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 889–892 (2013)

  21. Mei, Q., Zhai, C.: Discovering evolutionary theme patterns from text: An exploration of temporal text mining. In: Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining (SIGKDD), pp. 198–207 (2005)

  22. Montejo-Rez, A., Daz-Galiano, M.C., Martnez-Santiago, F., Urea-Lpez, L.A.: Crowd explicit sentiment analysis. Knowl.-Based Syst. 69, 134–139 (2014)

    Article  Google Scholar 

  23. Qian, T., Li, Q., Liu, B., Xiong, H., Srivastava, J., Sheu, P.C.: Topic formation and development: A core-group evolving process. World Wide Web 17(6), 1343–1373 (2014)

    Article  Google Scholar 

  24. Ramage, D., Hall, D., Nallapati, R., Manning, C.D.: Labeled lda: A supervised topic model for credit attribution in multi-labeled corpora. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 248–256 (2009)

  25. Rao, Y., Lei, J., Wenyin, L., Li, Q., Chen, M.: Building emotional dictionary for sentiment analysis of online news. World Wide Web 17(4), 723–742 (2014)

    Article  Google Scholar 

  26. Rosenthal, S., Nakov, P., Kiritchenko, S., Mohammad, S.M., Ritter, A., Stoyanov, V.: SemEval-2015 task 10: Sentiment analysis in twitter. In: Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval), pp. 451–463 (2015)

  27. Rosen-Zvi, M., Griffiths, T., Steyvers, M., Smyth, P.: The author-topic model for authors and documents. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence (UAI), pp. 487–494 (2004)

  28. Shuyo, N.: Language detection library for java. http://code.google.com/p/language-detection/ (2010)

  29. Si, J., Li, Q., Qian, T., Deng, X.: Users’ interest grouping from online reviews based on topic frequency and order. World Wide Web 17(6), 1321–1342 (2014)

    Article  Google Scholar 

  30. Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M.: Hierarchical dirichlet processes. J. Amer. Stat. Assoc. 101(476), 1566–1581 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  31. Tang, J., Zhang, M., Mei, Q.: One theme in all views: Modeling consensus topics in multiple contexts. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD), pp. 5–13 (2013)

  32. Tang, X., Yang, C.C.: TUT: A statistical model for detecting trends, topics and user interests in social media. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM), pp. 972–981 (2012)

  33. Tao, K., Abel, F., Hauff, C., Houben, G.-J., Gadiraju, U.: Groundhog day: Near-duplicate detection on twitter. In: Proceedings of the 22nd International Conference on World Wide Web (WWW), pp. 1273–1284 (2013)

  34. Wang, X., McCallum, A.: Topics over time: A non-markov continuous-time model of topical trends. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD), pp. 424–433 (2006)

  35. Zhou, E., Zhong, N., Li, Y.: Extracting news blog hot topics based on the W2T methodology. World Wide Web 17(3), 377–404 (2014)

    Article  Google Scholar 

Download references

Acknowledgment

This research was supported by the Basic Science Research Program and the Next-Generation Information Computing Development Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT and Future Planning (numbers 2015R1A2A1A10052665, 2015R1A2A1A15052701 and 2012M3C4A7033344).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to SangKeun Lee.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Alam, M.H., Ryu, WJ. & Lee, S. Hashtag-based topic evolution in social media. World Wide Web 20, 1527–1549 (2017). https://doi.org/10.1007/s11280-017-0451-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-017-0451-3

Keywords

Navigation