skip to main content
10.1145/2837185.2837198acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiiwasConference Proceedingsconference-collections
research-article

BUTE: bursty users tagging method estimated by time series data

Published:11 December 2015Publication History

ABSTRACT

Many Twitter users post tweets that are related to their particular interests. Users can also collect information by following other users. One approach clarifies user interests by tagging labels based on the users. A user tagging method is important to discover candidate users with similar interests. Typical approaches estimate user interests with terms in tweets and by applying graph theory such as following networks. In contrast, we propose a new user tagging method using the posting time series data of the number of tweets and developed the following hypothesis: Since users have interests, they will post more tweets at the time occurring the events compared with general times. Based on this hypothesis, we extract interests as burst levels from the user and hashtag time series data with Kleinberg's burst enumerating algorithm. We manage the burst levels of users as the term frequency in documents and calculate the hashtag scores for each user by three typical score calculation methods: cosine similarity, Naive Bayes, and TF-IDF. Thus, the proposed method needs no linguistic analysis which requires heavy computational resources. With our sophisticated experimental evaluations with actually active users, we demonstrate the high efficiency of our tagging methods, evaluate them using such information retrieval system evaluation metrics as expected reciprocal rank (ERR) and Q-measure, and clarify the strengths and limitations of each one. Naive Bayes and cosine similarity are especially suitable for user tagging and tag score calculation tasks.

References

  1. Solar eclipse of may 20, 2012. https://en.wikipedia.org/wiki/Solar_eclipse_of_May_20,_2012.Google ScholarGoogle Scholar
  2. Twitter. https://twitter.com.Google ScholarGoogle Scholar
  3. Twitter search api. https://dev.twitter.com/docs/api/1/get/search.Google ScholarGoogle Scholar
  4. A. Balmin, V. Hristidis, and Y. Papakonstantinou. Objectrank: Authority-based keyword search in databases. In Proceedings of the VLDB2004, pages 564--575, 2004.Google ScholarGoogle Scholar
  5. D. M. Blei and J. D. Lafferty. Dynamic topic models. In Proceedings of the ICML2006, pages 113--120, 2006.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. The Journal of Machine Learning Research, 3:993--1022, 2003.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Cha, H. Haddadi, F. Benevenuto, and K. P. Gummadi. Measuring user influence in twitter: The million follower fallacy. In Proceedings of the ICWSM2010, pages 10--17, 2010.Google ScholarGoogle Scholar
  8. O. Chapelle, D. Metlzer, Y. Zhang, and P. Grinspan. Expected reciprocal rank for graded relevance. In Proceedings of the CIKM2004, CIKM '09, pages 621--630, 2009.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Q. Diao, J. Jiang, F. Zhu, and E.-P. Lim. Finding bursty topics from microblogs. In Proceedings of the ACL2012, pages 536--544, 2012.Google ScholarGoogle Scholar
  10. P. Domingos and M. Pazzani. On the optimality of the simple bayesian classifier under zero-one loss. The Journal of Machine Learning Research, 29(2--3):103--130, 1997.Google ScholarGoogle Scholar
  11. J. Huang, K. M. Thornton, and E. N. Efthimiadis. Conversational tagging in twitter. In Proceedings of the the HT2010, pages 173--178, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Kleinberg. Bursty and hierarchical structure in streams. In Proceedings of the KDD2002, pages 91--101, 2002.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. D. Koike, Y. Takahashi, T. Utsuro, M. Yoshioka, and N. Kando. Time series topic modeling and bursty topic detection of correlated news and twitter. In Proceedings of the IJCNLP2013, pages 917--921, 2013.Google ScholarGoogle Scholar
  14. C. Li, A. Sun, and A. Datta. Twevent: Segment-based event detection from tweets. In Proceedings of the CIKM2012, pages 155--164, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Z. Ma, A. Sun, Q. Yuan, and G. Cong. Tagging your tweets: A probabilistic modeling of hashtag annotation in twitter. In Proceedings of the CIKM2014, CIKM '14, pages 999--1008, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. C. D. Manning, P. Raghavan, and H. Schutze. Introduction to Information Retrieval, chapter Scoring, term weighting, and the vector space model, page 100. Cambridge University Press, 2008.Google ScholarGoogle Scholar
  17. M. Mathioudakis and N. Koudas. Twittermonitor: Trend detection over the twitter stream. In Proceedings of the SIGMOD2010, pages 1155--1158, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. R. Mihalcea and P. Tarau. Textrank: Bringing order into texts. In D. Lin and D. Wu, editors, Proceedings of the EMNLP 2004, pages 404--411, July 2004.Google ScholarGoogle Scholar
  19. Y. Mizunuma, S. Yamamoto, Y. Yamaguchi, A. Ikeuchi, T. Satoh, and S. Shimada. Twitter bursts: Analysis of their occurrences and classifications. In Proceedings of the ICDS 2014, pages 182--187, 2014.Google ScholarGoogle Scholar
  20. L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical Report 1999-66, Stanford InfoLab, November 1999.Google ScholarGoogle Scholar
  21. A. Pal and S. Counts. Identifying topical authorities in microblogs. In Proceedings of the WSDM2011, pages 45--54, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. T. Sakai. New performance metrics based on multigrade relevance: Their application to question answering. In Proceedings of the NTCIR2004, 2004.Google ScholarGoogle Scholar
  23. C. Spearman. The proof and measurement of association between two things. The American Journal of Psychology, 15(1):72--101, 1904.Google ScholarGoogle ScholarCross RefCross Ref
  24. Twitter. Twitter reports fourth quarter and fiscal year 2013 results. https://investor.twitterinc.com/releasedetail.cfm?ReleaseID=823321, Feb. 2014.Google ScholarGoogle Scholar
  25. X. Wang, C. Zhai, X. Hu, and R. Sproat. Mining correlated bursty topic patterns from coordinated text streams. In Proceedings of the KDD2007, KDD '07, pages 784--793, 2007.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. Weng, E.-P. Lim, J. Jiang, and Q. He. Twitterrank: finding topic-sensitive influential twitterers. In Proceedings of the WSDM2010, pages 261--270, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. W. Wu, B. Zhang, and M. Ostendorf. Automatic generation of personalized annotation tags for twitter users. In Proceedings of the HLT2010, pages 689--692, 2010.Google ScholarGoogle Scholar
  28. W. Xie, F. Zhu, J. Jiang, E.-P. Lim, and K. Wang. Topicsketch: Real-time bursty topic detection from twitter. In Proceedings of the ICDM2013, pages 837--846, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  29. Y. Yamaguchi, T. Amagasa, and H. Kitagawa. Tagging users based on twitter lists. Int. J. Web Eng. Technol., 7(3):273--298, Aug. 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Y. Yamaguchi, T. Takahashi, T. Amagasa, and H. Kitagawa. Turank: Twitter user ranking based on user-tweet graph analysis. In Proceedings of the WISE2010, pages 240--253, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. BUTE: bursty users tagging method estimated by time series data

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        iiWAS '15: Proceedings of the 17th International Conference on Information Integration and Web-based Applications & Services
        December 2015
        704 pages
        ISBN:9781450334914
        DOI:10.1145/2837185

        Copyright © 2015 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 11 December 2015

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader