research-article

BUTE: bursty users tagging method estimated by time series data

Authors:
Shuhei Yamamoto

University of Tsukuba, Tsukuba, Ibaraki, Japan

University of Tsukuba, Tsukuba, Ibaraki, Japan
View Profile

,
Kei Wakabayashi

University of Tsukuba, Tsukuba, Ibaraki, Japan

University of Tsukuba, Tsukuba, Ibaraki, Japan
View Profile

,
Noriko Kando

National Inst. of Informatics, Chiyoda, Tokyo, Japan

National Inst. of Informatics, Chiyoda, Tokyo, Japan
View Profile

,
Tetsuji Satoh

University of Tsukuba, Tsukuba, Ibaraki, Japan

University of Tsukuba, Tsukuba, Ibaraki, Japan
View Profile

iiWAS '15: Proceedings of the 17th International Conference on Information Integration and Web-based Applications & ServicesDecember 2015Article No.: 20Pages 1–9https://doi.org/10.1145/2837185.2837198

Published:11 December 2015Publication History

iiWAS '15: Proceedings of the 17th International Conference on Information Integration and Web-based Applications & Services

Pages 1–9

ABSTRACT

Many Twitter users post tweets that are related to their particular interests. Users can also collect information by following other users. One approach clarifies user interests by tagging labels based on the users. A user tagging method is important to discover candidate users with similar interests. Typical approaches estimate user interests with terms in tweets and by applying graph theory such as following networks. In contrast, we propose a new user tagging method using the posting time series data of the number of tweets and developed the following hypothesis: Since users have interests, they will post more tweets at the time occurring the events compared with general times. Based on this hypothesis, we extract interests as burst levels from the user and hashtag time series data with Kleinberg's burst enumerating algorithm. We manage the burst levels of users as the term frequency in documents and calculate the hashtag scores for each user by three typical score calculation methods: cosine similarity, Naive Bayes, and TF-IDF. Thus, the proposed method needs no linguistic analysis which requires heavy computational resources. With our sophisticated experimental evaluations with actually active users, we demonstrate the high efficiency of our tagging methods, evaluate them using such information retrieval system evaluation metrics as expected reciprocal rank (ERR) and Q-measure, and clarify the strengths and limitations of each one. Naive Bayes and cosine similarity are especially suitable for user tagging and tag score calculation tasks.

References

Solar eclipse of may 20, 2012. https://en.wikipedia.org/wiki/Solar_eclipse_of_May_20,_2012.Google Scholar
Twitter. https://twitter.com.Google Scholar
Twitter search api. https://dev.twitter.com/docs/api/1/get/search.Google Scholar
A. Balmin, V. Hristidis, and Y. Papakonstantinou. Objectrank: Authority-based keyword search in databases. In Proceedings of the VLDB2004, pages 564--575, 2004.Google Scholar
D. M. Blei and J. D. Lafferty. Dynamic topic models. In Proceedings of the ICML2006, pages 113--120, 2006.Google ScholarDigital Library
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. The Journal of Machine Learning Research, 3:993--1022, 2003.Google ScholarDigital Library
M. Cha, H. Haddadi, F. Benevenuto, and K. P. Gummadi. Measuring user influence in twitter: The million follower fallacy. In Proceedings of the ICWSM2010, pages 10--17, 2010.Google Scholar
O. Chapelle, D. Metlzer, Y. Zhang, and P. Grinspan. Expected reciprocal rank for graded relevance. In Proceedings of the CIKM2004, CIKM '09, pages 621--630, 2009.Google ScholarDigital Library
Q. Diao, J. Jiang, F. Zhu, and E.-P. Lim. Finding bursty topics from microblogs. In Proceedings of the ACL2012, pages 536--544, 2012.Google Scholar
P. Domingos and M. Pazzani. On the optimality of the simple bayesian classifier under zero-one loss. The Journal of Machine Learning Research, 29(2--3):103--130, 1997.Google Scholar
J. Huang, K. M. Thornton, and E. N. Efthimiadis. Conversational tagging in twitter. In Proceedings of the the HT2010, pages 173--178, 2010.Google ScholarDigital Library
J. Kleinberg. Bursty and hierarchical structure in streams. In Proceedings of the KDD2002, pages 91--101, 2002.Google ScholarDigital Library
D. Koike, Y. Takahashi, T. Utsuro, M. Yoshioka, and N. Kando. Time series topic modeling and bursty topic detection of correlated news and twitter. In Proceedings of the IJCNLP2013, pages 917--921, 2013.Google Scholar
C. Li, A. Sun, and A. Datta. Twevent: Segment-based event detection from tweets. In Proceedings of the CIKM2012, pages 155--164, 2012.Google ScholarDigital Library
Z. Ma, A. Sun, Q. Yuan, and G. Cong. Tagging your tweets: A probabilistic modeling of hashtag annotation in twitter. In Proceedings of the CIKM2014, CIKM '14, pages 999--1008, 2014.Google ScholarDigital Library
C. D. Manning, P. Raghavan, and H. Schutze. Introduction to Information Retrieval, chapter Scoring, term weighting, and the vector space model, page 100. Cambridge University Press, 2008.Google Scholar
M. Mathioudakis and N. Koudas. Twittermonitor: Trend detection over the twitter stream. In Proceedings of the SIGMOD2010, pages 1155--1158, 2010.Google ScholarDigital Library
R. Mihalcea and P. Tarau. Textrank: Bringing order into texts. In D. Lin and D. Wu, editors, Proceedings of the EMNLP 2004, pages 404--411, July 2004.Google Scholar
Y. Mizunuma, S. Yamamoto, Y. Yamaguchi, A. Ikeuchi, T. Satoh, and S. Shimada. Twitter bursts: Analysis of their occurrences and classifications. In Proceedings of the ICDS 2014, pages 182--187, 2014.Google Scholar
L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical Report 1999-66, Stanford InfoLab, November 1999.Google Scholar
A. Pal and S. Counts. Identifying topical authorities in microblogs. In Proceedings of the WSDM2011, pages 45--54, 2011.Google ScholarDigital Library
T. Sakai. New performance metrics based on multigrade relevance: Their application to question answering. In Proceedings of the NTCIR2004, 2004.Google Scholar
C. Spearman. The proof and measurement of association between two things. The American Journal of Psychology, 15(1):72--101, 1904.Google ScholarCross Ref
Twitter. Twitter reports fourth quarter and fiscal year 2013 results. https://investor.twitterinc.com/releasedetail.cfm?ReleaseID=823321, Feb. 2014.Google Scholar
X. Wang, C. Zhai, X. Hu, and R. Sproat. Mining correlated bursty topic patterns from coordinated text streams. In Proceedings of the KDD2007, KDD '07, pages 784--793, 2007.Google ScholarDigital Library
J. Weng, E.-P. Lim, J. Jiang, and Q. He. Twitterrank: finding topic-sensitive influential twitterers. In Proceedings of the WSDM2010, pages 261--270, 2010.Google ScholarDigital Library
W. Wu, B. Zhang, and M. Ostendorf. Automatic generation of personalized annotation tags for twitter users. In Proceedings of the HLT2010, pages 689--692, 2010.Google Scholar
W. Xie, F. Zhu, J. Jiang, E.-P. Lim, and K. Wang. Topicsketch: Real-time bursty topic detection from twitter. In Proceedings of the ICDM2013, pages 837--846, 2013.Google ScholarCross Ref
Y. Yamaguchi, T. Amagasa, and H. Kitagawa. Tagging users based on twitter lists. Int. J. Web Eng. Technol., 7(3):273--298, Aug. 2012.Google ScholarDigital Library
Y. Yamaguchi, T. Takahashi, T. Amagasa, and H. Kitagawa. Turank: Twitter user ranking based on user-tweet graph analysis. In Proceedings of the WISE2010, pages 240--253, 2010.Google ScholarDigital Library

Index Terms

BUTE: bursty users tagging method estimated by time series data
1. Information systems
  1. Information retrieval
  2. Information systems applications

Recommendations

On tweets, retweets, hashtags and user profiles in the 2016 American Presidential Election Scene
dg.o '17: Proceedings of the 18th Annual International Conference on Digital Government Research

Twitter is a microblogging where users can publish short messages restricted to 140 characters. It has been used in the political scene from different perspectives. One of them is predicting election results. In this area, many researchers have drawn ...
Read More
Finding news-topic oriented influential twitter users based on topic related hashtag community detection

Recently, more and more users would like to collect and provide information about news topics in Twitter, which is one of the most popular microblogging services. Virtual communities defined by hashtags in Twitter are created for exchanging information ...
Read More
Hashtag retrieval in a microblogging environment
SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval

Microblog services let users broadcast brief textual messages to people who "follow" their activity. Often these posts contain terms called hashtags, markers of a post's meaning, audience, etc. This poster treats the following problem: given a user's ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
iiWAS '15: Proceedings of the 17th International Conference on Information Integration and Web-based Applications & Services
December 2015
704 pages
ISBN:9781450334914
DOI:10.1145/2837185
General Chair:
Gabriele Anderst-Kotsis
Johannes Kepler University Linz, Austria
,
Program Chair:
Maria Indrawan-Santiago
Monash University, Australia
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 11 December 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
ERR
Q-measure
TF-IDF
Twitter
burst
cosine similarity
hashtag
naive bayes
time series data
user tagging
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 66
  Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

BUTE: bursty users tagging method estimated by time series data

iiWAS '15: Proceedings of the 17th International Conference on Information Integration and Web-based Applications & Services

ABSTRACT

References

Cited By

Index Terms

Recommendations

On tweets, retweets, hashtags and user profiles in the 2016 American Presidential Election Scene

Finding news-topic oriented influential twitter users based on topic related hashtag community detection

Hashtag retrieval in a microblogging environment

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

BUTE: bursty users tagging method estimated by time series data

iiWAS '15: Proceedings of the 17th International Conference on Information Integration and Web-based Applications & Services

ABSTRACT

References

Cited By

Index Terms

Recommendations

On tweets, retweets, hashtags and user profiles in the 2016 American Presidential Election Scene

Finding news-topic oriented influential twitter users based on topic related hashtag community detection

Hashtag retrieval in a microblogging environment

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media