ABSTRACT
Many Twitter users post tweets that are related to their particular interests. Users can also collect information by following other users. One approach clarifies user interests by tagging labels based on the users. A user tagging method is important to discover candidate users with similar interests. Typical approaches estimate user interests with terms in tweets and by applying graph theory such as following networks. In contrast, we propose a new user tagging method using the posting time series data of the number of tweets and developed the following hypothesis: Since users have interests, they will post more tweets at the time occurring the events compared with general times. Based on this hypothesis, we extract interests as burst levels from the user and hashtag time series data with Kleinberg's burst enumerating algorithm. We manage the burst levels of users as the term frequency in documents and calculate the hashtag scores for each user by three typical score calculation methods: cosine similarity, Naive Bayes, and TF-IDF. Thus, the proposed method needs no linguistic analysis which requires heavy computational resources. With our sophisticated experimental evaluations with actually active users, we demonstrate the high efficiency of our tagging methods, evaluate them using such information retrieval system evaluation metrics as expected reciprocal rank (ERR) and Q-measure, and clarify the strengths and limitations of each one. Naive Bayes and cosine similarity are especially suitable for user tagging and tag score calculation tasks.
- Solar eclipse of may 20, 2012. https://en.wikipedia.org/wiki/Solar_eclipse_of_May_20,_2012.Google Scholar
- Twitter. https://twitter.com.Google Scholar
- Twitter search api. https://dev.twitter.com/docs/api/1/get/search.Google Scholar
- A. Balmin, V. Hristidis, and Y. Papakonstantinou. Objectrank: Authority-based keyword search in databases. In Proceedings of the VLDB2004, pages 564--575, 2004.Google Scholar
- D. M. Blei and J. D. Lafferty. Dynamic topic models. In Proceedings of the ICML2006, pages 113--120, 2006.Google ScholarDigital Library
- D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. The Journal of Machine Learning Research, 3:993--1022, 2003.Google ScholarDigital Library
- M. Cha, H. Haddadi, F. Benevenuto, and K. P. Gummadi. Measuring user influence in twitter: The million follower fallacy. In Proceedings of the ICWSM2010, pages 10--17, 2010.Google Scholar
- O. Chapelle, D. Metlzer, Y. Zhang, and P. Grinspan. Expected reciprocal rank for graded relevance. In Proceedings of the CIKM2004, CIKM '09, pages 621--630, 2009.Google ScholarDigital Library
- Q. Diao, J. Jiang, F. Zhu, and E.-P. Lim. Finding bursty topics from microblogs. In Proceedings of the ACL2012, pages 536--544, 2012.Google Scholar
- P. Domingos and M. Pazzani. On the optimality of the simple bayesian classifier under zero-one loss. The Journal of Machine Learning Research, 29(2--3):103--130, 1997.Google Scholar
- J. Huang, K. M. Thornton, and E. N. Efthimiadis. Conversational tagging in twitter. In Proceedings of the the HT2010, pages 173--178, 2010.Google ScholarDigital Library
- J. Kleinberg. Bursty and hierarchical structure in streams. In Proceedings of the KDD2002, pages 91--101, 2002.Google ScholarDigital Library
- D. Koike, Y. Takahashi, T. Utsuro, M. Yoshioka, and N. Kando. Time series topic modeling and bursty topic detection of correlated news and twitter. In Proceedings of the IJCNLP2013, pages 917--921, 2013.Google Scholar
- C. Li, A. Sun, and A. Datta. Twevent: Segment-based event detection from tweets. In Proceedings of the CIKM2012, pages 155--164, 2012.Google ScholarDigital Library
- Z. Ma, A. Sun, Q. Yuan, and G. Cong. Tagging your tweets: A probabilistic modeling of hashtag annotation in twitter. In Proceedings of the CIKM2014, CIKM '14, pages 999--1008, 2014.Google ScholarDigital Library
- C. D. Manning, P. Raghavan, and H. Schutze. Introduction to Information Retrieval, chapter Scoring, term weighting, and the vector space model, page 100. Cambridge University Press, 2008.Google Scholar
- M. Mathioudakis and N. Koudas. Twittermonitor: Trend detection over the twitter stream. In Proceedings of the SIGMOD2010, pages 1155--1158, 2010.Google ScholarDigital Library
- R. Mihalcea and P. Tarau. Textrank: Bringing order into texts. In D. Lin and D. Wu, editors, Proceedings of the EMNLP 2004, pages 404--411, July 2004.Google Scholar
- Y. Mizunuma, S. Yamamoto, Y. Yamaguchi, A. Ikeuchi, T. Satoh, and S. Shimada. Twitter bursts: Analysis of their occurrences and classifications. In Proceedings of the ICDS 2014, pages 182--187, 2014.Google Scholar
- L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical Report 1999-66, Stanford InfoLab, November 1999.Google Scholar
- A. Pal and S. Counts. Identifying topical authorities in microblogs. In Proceedings of the WSDM2011, pages 45--54, 2011.Google ScholarDigital Library
- T. Sakai. New performance metrics based on multigrade relevance: Their application to question answering. In Proceedings of the NTCIR2004, 2004.Google Scholar
- C. Spearman. The proof and measurement of association between two things. The American Journal of Psychology, 15(1):72--101, 1904.Google ScholarCross Ref
- Twitter. Twitter reports fourth quarter and fiscal year 2013 results. https://investor.twitterinc.com/releasedetail.cfm?ReleaseID=823321, Feb. 2014.Google Scholar
- X. Wang, C. Zhai, X. Hu, and R. Sproat. Mining correlated bursty topic patterns from coordinated text streams. In Proceedings of the KDD2007, KDD '07, pages 784--793, 2007.Google ScholarDigital Library
- J. Weng, E.-P. Lim, J. Jiang, and Q. He. Twitterrank: finding topic-sensitive influential twitterers. In Proceedings of the WSDM2010, pages 261--270, 2010.Google ScholarDigital Library
- W. Wu, B. Zhang, and M. Ostendorf. Automatic generation of personalized annotation tags for twitter users. In Proceedings of the HLT2010, pages 689--692, 2010.Google Scholar
- W. Xie, F. Zhu, J. Jiang, E.-P. Lim, and K. Wang. Topicsketch: Real-time bursty topic detection from twitter. In Proceedings of the ICDM2013, pages 837--846, 2013.Google ScholarCross Ref
- Y. Yamaguchi, T. Amagasa, and H. Kitagawa. Tagging users based on twitter lists. Int. J. Web Eng. Technol., 7(3):273--298, Aug. 2012.Google ScholarDigital Library
- Y. Yamaguchi, T. Takahashi, T. Amagasa, and H. Kitagawa. Turank: Twitter user ranking based on user-tweet graph analysis. In Proceedings of the WISE2010, pages 240--253, 2010.Google ScholarDigital Library
Index Terms
- BUTE: bursty users tagging method estimated by time series data
Recommendations
On tweets, retweets, hashtags and user profiles in the 2016 American Presidential Election Scene
dg.o '17: Proceedings of the 18th Annual International Conference on Digital Government ResearchTwitter is a microblogging where users can publish short messages restricted to 140 characters. It has been used in the political scene from different perspectives. One of them is predicting election results. In this area, many researchers have drawn ...
Finding news-topic oriented influential twitter users based on topic related hashtag community detection
Recently, more and more users would like to collect and provide information about news topics in Twitter, which is one of the most popular microblogging services. Virtual communities defined by hashtags in Twitter are created for exchanging information ...
Hashtag retrieval in a microblogging environment
SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrievalMicroblog services let users broadcast brief textual messages to people who "follow" their activity. Often these posts contain terms called hashtags, markers of a post's meaning, audience, etc. This poster treats the following problem: given a user's ...
Comments