skip to main content
10.1145/2505515.2505615acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

On sampling the wisdom of crowds: random vs. expert sampling of the twitter stream

Published: 27 October 2013 Publication History

Abstract

Several applications today rely upon content streams crowd-sourced from online social networks. Since real-time processing of large amounts of data generated on these sites is difficult, analytics companies and researchers are increasingly resorting to sampling. In this paper, we investigate the crucial question of how to sample the data generated by users in social networks. The traditional method is to randomly sample all the data. We analyze a different sampling methodology, where content is gathered only from a relatively small subset (< 1%) of the user population namely, the expert users. Over the duration of a month, we gathered tweets from over 500,000 Twitter users who are identified as experts on a diverse set of topics, and compared the resulting expert-sampled tweets with the 1% randomly sampled tweets provided publicly by Twitter. We compared the sampled datasets along several dimensions, including the diversity, timeliness, and trustworthiness of the information contained within them, and find important differences between the datasets. Our observations have major implications for applications such as topical search, trustworthy content recommendations, and breaking news detection.

References

[1]
Twitter Help Center: How to Use Twitter Lists. http://tinyurl.com/UseTwitterLists.
[2]
Twitter now averaging 400 million tweets daily. http://tinyurl.com/TweetsPerDay, Jun 2012.
[3]
S. Ardon et al. Spatio-Temporal Analysis of Topic Popularity in Twitter. arXiv:1111.2904 {cs.SI}, 2011.
[4]
M. Cha, H. Haddadi, F. Benevenuto, and K. P. Gummadi. Measuring User Influence in Twitter: The Million Follower Fallacy. In Proc. ICWSM, May 2010.
[5]
M. D. Choudhury, S. Counts, and M. Czerwinski. Find me the right content! diversity-based sampling of social media spaces for topic-centric search. In Proc. ICWSM, 2011.
[6]
E. F. Fama. Efficient capital markets: a review of theory and empirical work. The Journal of Finance, 25(2):383--417, 1970.
[7]
S. Ghosh, N. Sharma, F. Benevenuto, N. Ganguly, and K. Gummadi. Cognos: crowdsourcing search for topic experts in microblogs. In Proc. ACM SIGIR, 2012.
[8]
C. Grier, K. Thomas, V. Paxson, and M. Zhang. @spam: the underground on 140 characters or less. In Proc. ACM CCS, 2010.
[9]
J. Lin, R. Snow, and W. Morgan. Smoothing techniques for adaptive online language models: topic tracking in tweet streams. In Proc. ACM SIGKDD, 2011.
[10]
F. Morstatter, J. Pfeffer, H. Liu, and K. M. Carley. Is the Sample Good Enough? Comparing Data from Twitter's Streaming API with Twitter's Firehose. In Proc. ICWSM, 2013.
[11]
D. Ramage, S. Dumais, and D. Liebling. Characterizing microblogs with topic models. In Proc. ICWSM, 2010.
[12]
J. Sankaranarayanan, H. Samet, B. E. Teitler, M. D. Lieberman, and J. Sperling. TwitterStand: news in tweets. In Proc. ACM SIGSPATIAL Conf. on Advances in Geographic Information Systems, 2009.
[13]
N. Sharma, S. Ghosh, F. Benevenuto, N. Ganguly, and K. Gummadi. Inferring Who-is-Who in the Twitter Social Network. In Workshop on Online Social Networks, 2012.
[14]
Limit on Streaming Tweets | Twitter Developers. https://dev.twitter.com/discussions/6789.

Cited By

View all
  • (2024)Less is More: Exploring Sampled Twitter Data Steams for Pandemic Surveillance and Monitoring2024 IEEE International Performance, Computing, and Communications Conference (IPCCC)10.1109/IPCCC59868.2024.10850027(1-6)Online publication date: 22-Nov-2024
  • (2024)Verbalization Categories during Information Evaluation2024 16th International Conference on Information Technology and Electrical Engineering (ICITEE)10.1109/ICITEE62483.2024.10808361(47-52)Online publication date: 23-Oct-2024
  • (2022)Don’t Go Chasing Narcissists: A Relational-Based and Multiverse Perspective on Leader Narcissism and Follower Engagement Using a Machine Learning ApproachPersonality and Social Psychology Bulletin10.1177/0146167222109497649:7(1130-1147)Online publication date: 27-May-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge Management
October 2013
2612 pages
ISBN:9781450322638
DOI:10.1145/2505515
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 October 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. random sampling
  2. sampling content streams
  3. sampling from experts
  4. twitter
  5. twitter lists

Qualifiers

  • Research-article

Conference

CIKM'13
Sponsor:
CIKM'13: 22nd ACM International Conference on Information and Knowledge Management
October 27 - November 1, 2013
California, San Francisco, USA

Acceptance Rates

CIKM '13 Paper Acceptance Rate 143 of 848 submissions, 17%;
Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)38
  • Downloads (Last 6 weeks)4
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Less is More: Exploring Sampled Twitter Data Steams for Pandemic Surveillance and Monitoring2024 IEEE International Performance, Computing, and Communications Conference (IPCCC)10.1109/IPCCC59868.2024.10850027(1-6)Online publication date: 22-Nov-2024
  • (2024)Verbalization Categories during Information Evaluation2024 16th International Conference on Information Technology and Electrical Engineering (ICITEE)10.1109/ICITEE62483.2024.10808361(47-52)Online publication date: 23-Oct-2024
  • (2022)Don’t Go Chasing Narcissists: A Relational-Based and Multiverse Perspective on Leader Narcissism and Follower Engagement Using a Machine Learning ApproachPersonality and Social Psychology Bulletin10.1177/0146167222109497649:7(1130-1147)Online publication date: 27-May-2022
  • (2022)My Tweets Bring All the Traits to the Yard: Predicting Personality and Relational Traits in Online Social NetworksACM Transactions on the Web10.1145/352374916:2(1-26)Online publication date: 20-May-2022
  • (2022)Active Keyword Selection to Track Evolving Topics on Twitter2022 IEEE International Conference on Data Mining Workshops (ICDMW)10.1109/ICDMW58026.2022.00144(507-516)Online publication date: Nov-2022
  • (2021)Effects of the COVID-19 Pandemic on Classrooms: A Case Study on Foreigners in South Korea Using Applied Machine LearningSustainability10.3390/su1309498613:9(4986)Online publication date: 29-Apr-2021
  • (2021)A longitudinal and geospatial analysis of COVID-19 tweets during the early outbreak period in the United StatesBMC Public Health10.1186/s12889-021-10827-421:1Online publication date: 24-Apr-2021
  • (2021)Inferring Missing Retweets in Twitter Information CascadesNew Trends in Database and Information Systems10.1007/978-3-030-85082-1_25(287-292)Online publication date: 17-Jul-2021
  • (2021)I Alone Can Fix ItJournal of the Association for Information Science and Technology10.1002/asi.2449072:11(1323-1336)Online publication date: 1-Oct-2021
  • (2020)Impact of Agricultural Communication Interventions on Improving Agricultural Productivity in MalawiJournal of International Agricultural and Extension Education10.5191/jiaee.2020.27311627:3(116-131)Online publication date: 19-Aug-2020
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media