Skip to main content
Log in

Topic category analysis on twitter via cross-media strategy

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

The growing popularity of social media provides a huge volume of social data including Tweets. These collections of social data can be potentially useful, but the extent of meaningful data in these collections has not been sufficiently researched, especially in South Korea Twitter data. In general, the South Korea Twitter data has been researched as a source of political media. Nonetheless, previous research on South Korea Twitter data has not adequately covered what kind of trend Twitter represents in terms of major topic categories such as politics, economics, or sports. In this paper, we present a cross-media approach to define the nature of South Korea Tweets by inferring the topic category distribution through short-text categorization. We select newspapers as cross-media, examine the categorization of news articles from major newspapers, and then train our classifier based on the features from each topic category. In addition, for grafting news topics onto South Korea Tweets, we propose a word clustering and filtering approach to exclude those words that do not provide semantic content for the topic categories. Based on the proposed procedures, we analyze the South Korea Tweets to determine the primary topic category focus of Twitter users. We observe the special behaviors of the South Korea Twitter users based on various parameters such as date, time slot, and day of the week. Because our research includes a macroscopic analysis of Twitter data using a cross-media strategy, our research can provide useful resources for other social media analysis as well.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. AlSumait L, Barbará D, Domeniconi C (2008) On-line lda: adaptive topic models for mining text streams with applications to topic detection and tracking. In: Data Mining, 2008. ICDM'08. Eighth IEEE International Conference on. IEEE, pp 3–12

  2. Berger AL, Pietra VJD, Pietra SAD (1996) A maximum entropy approach to natural language processing. Comput Linguist 22(1):39–71

    Google Scholar 

  3. Blei DM (2004) Probabilistic models of text and images. University of California, Berkeley

    Google Scholar 

  4. Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022

    MATH  Google Scholar 

  5. Chang C-C, Lin C-J (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol (TIST) 2(3):27

    Google Scholar 

  6. Chen M, Jin X, Shen D (2011) Short text classification improved by learning multi-granularity topics. In: IJCAI. Citeseer, pp 1776–1781

  7. Cheong M, Lee V (2009) Integrating web-based intelligence retrieval and decision-making from the twitter trends knowledge base. In: Proceedings of the 2nd ACM workshop on Social web search and mining. ACM, pp 1–8

  8. Cho SW, Cha MS, Kim SY, Song JC, Sohn K-A (2014) Investigating temporal and spatial trends of brand images using twitter opinion mining. In: Information Science and Applications (ICISA), 2014 International Conference on. IEEE, pp 1–4

  9. Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. ACM SIGKDD Explor Newslett 11(1):10–18

    Article  Google Scholar 

  10. Hofmann T (1999) Probabilistic latent semantic indexing. In: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval. ACM, pp 50–57

  11. Hong L, Davison BD (2010) Empirical study of topic modeling in twitter. In: Proceedings of the First Workshop on Social Media Analytics. ACM, pp 80–88

  12. Hsu C, Park SJ, Park HW (2013) Political discourse among key twitter users: the case of Sejong city in South Korea. J Contemp Eastern Asia 12(1):65–79

    Article  Google Scholar 

  13. Joachims T (1996) A Probabilistic analysis of the Rocchio Algorithm with TFIDF for text categorization. DTIC Document

  14. Kwak H, Lee C, Park H, Moon S (2010) What is twitter, a social network or a news media? In: Proceedings of the 19th international conference on World wide web. ACM, pp 591–600

  15. Lau JH, Collier N, Baldwin T (2012) On-line trend analysis with topic models:\# twitter trends detection topic model online. In: COLING. Citeseer, pp 1519–1534

  16. Lee K, Palsetia D, Narayanan R, Patwary MMA, Agrawal A, Choudhary A (2011) Twitter trending topic classification. In: Data Mining Workshops (ICDMW), 2011 I.E. 11th International Conference on. IEEE, pp 251–258

  17. Lu R, Yang Q (2012) Trend analysis of news topics on twitter

  18. Mathioudakis M, Koudas N (2010) Twittermonitor: trend detection over the twitter stream. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. ACM, pp 1155–1158

  19. McCallum AK (2002) MALLET: a machine learning for language toolkit. http://mallet.cs.umass.edu

  20. Phan X-H, Nguyen L-M, Horiguchi S (2008) Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In: Proceedings of the 17th international conference on World Wide Web. ACM, pp 91–100

  21. Ramos J (2003) Using tf-idf to determine word relevance in document queries. In: Proceedings of the first instructional conference on machine learning

  22. Song G, Ye Y, Du X, Huang X, Bie S (2014) Short text classification: a survey. J Multimed 9(5):635–643

    Article  Google Scholar 

  23. Suykens JA, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9(3):293–300

    Article  MathSciNet  MATH  Google Scholar 

  24. Yoon HY, Park HW (2014) Strategies affecting twitter-based networking pattern of South Korean politicians: social network analysis and exponential random graph model. Qual Quant 48(1):409–423

    Article  Google Scholar 

  25. Zhao WX, Jiang J, Weng J, He J, Lim E-P, Yan H, Li X (2011) Comparing twitter and traditional media using topic models. In: Advances in information retrieval. Springer, Berlin, pp 338–349

    Chapter  Google Scholar 

Download references

Acknowledgments

This research was supported by the Basic Science Research Program through the National Research Foundation (NRF) of Korea funded by the Ministry of Science, ICT, and Future Planning (MSIP) (2014R1A1A3051169), and by the Ministry of Education (2012R1A1A2042792).

Conflict of interest

The authors declare that they have no conflict of interest.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kyung-Ah Sohn.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cho, S.W., Cha, M. & Sohn, KA. Topic category analysis on twitter via cross-media strategy. Multimed Tools Appl 75, 12879–12899 (2016). https://doi.org/10.1007/s11042-015-2866-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-015-2866-0

Keywords

Navigation