skip to main content
research-article

Twitter analytics: a big data management perspective

Published: 25 September 2014 Publication History

Abstract

With the inception of the Twitter microblogging platform in 2006, a myriad of research efforts have emerged studying different aspects of the Twittersphere. Each study exploits its own tools and mechanisms to capture, store, query and analyze Twitter data. Inevitably, platforms have been developed to replace this ad-hoc exploration with a more structured and methodological form of analysis. Another body of literature focuses on developing languages for querying Tweets. This paper addresses issues around the big data nature of Twitter and emphasizes the need for new data management and query language frameworks that address limitations of existing systems. We review existing approaches that were developed to facilitate twitter analytics followed by a discussion on research issues and technical challenges in developing integrated solutions.

References

[1]
FaceBook Query Language(FQL) overview. https://developers.facebook.com/docs/technical-guides/fql.
[2]
Neo4j: The world's leading graph database. http://www.neo4j.org/.
[3]
Sparksee: Scalable high-performance graph database. http://www.sparsity-technologies.com/.
[4]
Titan: distributed graph database. http://thinkaurelius.github.io/titan.
[5]
TrendsMap, Realtime local twitter trends. http://trendsmap.com/.
[6]
Twitalyzer: Serious analytics for social business. http://twitalyzer.com.
[7]
Yahoo! Query Language guide on YDN. https:// developer.yahoo.com/yql/.
[8]
F. Abel, C. Hauff, and G. Houben. Twitcident: fighting fire with information from social web streams. In WWW, pages 305--308, 2012.
[9]
Amer-Yahia, V. Markl, A. Halevy, A. Doan, G. Alonso, D. Kossmann, and G. Weikum. Databases and Web 2.0 panel at VLDB 2007. In SIGMOD Record, volume 37, pages 49--52, Mar. 2008.
[10]
S. AmerYahia;, L. V. Lakshmanan;, and Cong Yu. SocialScope : Enabling information discovery on social content sites. In CIDR, 2009.
[11]
T. Baldwin, P. Cook, and B. Han. A support platform for event detection using social intelligence. In Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics, pages 69--72, 2012.
[12]
L. Barbosa and J. Feng. Robust sentiment detection on Twitter from biased and noisy data. pages 36--44, Aug. 2010.
[13]
M. S. Bernstein, B. Suh, L. Hong, J. Chen, S. Kairam, and E. H. Chi. Eddi: interactive topic-based browsing of social status streams. In 23nd annual ACM symposium on User interface software and technology - UIST, pages 303--312, Oct. 2010.
[14]
A. Bifet and E. Frank. Sentiment knowledge discovery in twitter streaming data. Discovery Science. Springer Berlin Heidelberg, pages 1--15, Oct. 2010.
[15]
A. Black, C. Mascaro, M. Gallagher, and S. P. Goggins. Twitter Zombie: Architecture for capturing, socially transforming and analyzing the Twittersphere. In International conference on Supporting group work, pages 229--238, 2012.
[16]
M. Boanjak and E. Oliveira. TwitterEcho - A distributed focused crawler to support open research with twitter data. In International conference companion on World Wide Web, pages 1233--1239, 2012.
[17]
K. Bontcheva and L. Derczynski. TwitIE: an opensource information extraction pipeline for microblog text. In International Conference on Recent Advances in Natural Language Processing, 2013.
[18]
C. Budak, T. Georgiou, and D. E. Abbadi. GeoScope: Online detection of geo-correlated information trends in social networks. PVLDB, 7(4):229--240, 2013.
[19]
C. Byun, H. Lee, Y. Kim, and K. K. Kim. Twitter data collecting tool with rule-based filtering and analysis module. International Journal of Web Information Systems, 9(3):184--203, 2013.
[20]
S. Carter, W. Weerkamp, and M. Tsagkias. Microblog language identification: Overcoming the limitations of short, unedited and idiomatic text. Language Resources and Evaluation, 47(1):195--215, June 2012.
[21]
M. Cha, H. Haddadi, F. Benevenuto, and K. P. Gummadi. Measuring user influence in twitter: The million follower fallacy. In ICWSM, pages 10--17, 2010.
[22]
S. Chandra, L. Khan, and F. B. Muhaya. Estimating twitter user location using social interactions--a content based approach. In IEEE Conference on Privacy, Security, Risk and Trust, pages 838--843, Oct. 2011.
[23]
C. Chen, F. Li, C. Ooi, and S. Wu. TI : An efficient indexing mechanism for real-time search. In SIGMOD, pages 649--660, 2011.
[24]
Z. Cheng, J. Caverlee, K. Lee, and C. Science. A content-driven framework for geo-locating microblog users. ACM Transactions on Intelligent Systems and Technology, 2012.
[25]
M. Cheong and S. Ray. A literature review of recent microblogging developments. Technical report, Clayton School of Information Technology, Monash University, 2011.
[26]
Chew, Cynthia, and G. Eysenbach. Pandemics in the age of twitter: content analysis of tweets during the 2009 H1N1 outbreak. PloS one, 5(11), 2010.
[27]
B. O. Connor, N. A. Smith, and E. P. Xing. A latent variable model for geographic lexical variation. In Conference on Empirical Methods in Natural Language Processing, pages 1277--1287, 2010.
[28]
Conover, Michael, J. Ratkiewicz, M. Francisco, B. Gonçalves, F. Menczer, and A. Flammini. Political polarization on Twitter. In ICWSM, 2011.
[29]
J. David. Thats what friends are for inferring location in online social media platforms based on social relationships. In ICWSM, 2013.
[30]
Diego Serrano, Eleni Stroulia, Denilson Barbosa and V. Guana. SociQL: A query language for the social Web. In E. Kranakis, editor, Advances in Network Analysis and its Applications, chapter 17, pages 381--406. 2013.
[31]
Y. Doytsher and B. Galon. Querying geo-social data by bridging spatial networks and social networks. In 2nd ACM SIGSPATIAL International Workshop on Location Based Social Networks, pages 39--46, 2010.
[32]
A. Dries, S. Nijssen, and L. De Raedt. A query language for analyzing networks. In CIKM, pages 485--494, 2009.
[33]
M. Efron. Hashtag retrieval in a microblogging environment. pages 787--788, 2010.
[34]
S. Frénot and S. Grumbach. An in-browser microblog ranking engine. In International conference on Advances in Conceptual Modeling, volume 7518, pages 78--88, 2012.
[35]
G. Golovchinsky and M. Efron. Making sense of Twitter search. In CHI, 2010.
[36]
M. Graham, S. A. Hale, and D. Gaffney. Where in the world are you -- Geolocation and language identification in Twitter. In ICWSM, pages 518--521, 2012.
[37]
B. Hecht, L. Hong, B. Suh, and E. Chi. Tweets from Justin Bieber's heart: the dynamics of the location field in user profiles. In Conference on Human Factors in Computing Systems, pages 237--246, 2011.
[38]
B. J. Jansen, M. Zhang, K. Sobel, and A. Chowdury. Twitter power: Tweets as electronic word of mouth. Journal of the American Society for Information Science and Technology, 60(11):2169--2188, Nov. 2009.
[39]
J. Jiang, L. Hidayah, T. Elsayed, and H. Ramadan. BEST of KAUST at TREC-2011 : Building effective search in Twitter. TREC, 2011.
[40]
P. Jürgens, A. Jungherr, and H. Schoen. Small worlds with a difference: new gatekeepers and the filtering of political information on Twitter. In International Web Science Conference-WebSci, pages 1--5, June 2011.
[41]
U. Kang, D. H. Chau, and C. Faloutsos. Managing and mining large graphs : Systems and implementations. In SIGMOD, volume 1, pages 589--592, 2012.
[42]
U. Kang and C. Faloutsos. Big graph mining : Algorithms and discoveries. SIGKDD Explorations, 14(2):29--36, 2013.
[43]
U. Kang, H. Tong, J. Sun, C.-Y. Lin, and C. Faloutsos. Gbase: An efficient analysis platform for large graphs. VLDB Journal, 21(5):637--650, June 2012.
[44]
S. Kumar, G. Barbier, M. Abbasi, and H. Liu. Tweet-Tracker: An analysis tool for humanitarian and disaster relief. In ICWSM, pages 661--662, 2011.
[45]
H. Kwak, C. Lee, H. Park, and S. Moon. What is twitter, a social network or a news media? In WWW, pages 591--600, 2010.
[46]
C.-H. Lee, H.-C. Yang, T.-F. Chien, and W.-S. Wen. A novel approach for event detection by mining spatiotemporal information on microblogs. In International Conference on Advances in Social Networks Analysis and Mining, pages 254--259, July 2011.
[47]
C. Li, J. Weng, Q. He, Y. Yao, and A. Datta. TwiNER: named entity recognition in targeted twitter stream. In SIGIR, pages 721--730, 2012.
[48]
A. Marcus, M. Bernstein, and O. Badar. Tweets as data: demonstration of TweeQL and Twitinfo. In SIGMOD, pages 1259--1261, 2011.
[49]
A. Marcus, M. Bernstein, and O. Badar. Processing and visualizing the data in tweets. SIGMOD Record, 40(4), 2012.
[50]
M. S. Martín and C. Gutierrez. Representing, querying and transforming social networks with RDF/SPARQL. European Semantic Web Conference, pages 293--307, 2009.
[51]
P. T. W. Mauro San Martín, Claudio Gutierrez. SNQL : A social network query and transformation language. In 5th Alberto Mendelzon International Workshop on Foundations of Data Management, 2011.
[52]
M. Mcglohon and C. Faloutsos. Statistical properties of social networks. In C. C. Aggarwal, editor, Social Network Data Analytics, chapter 2, pages 17--42. 2011.
[53]
P. Mendes, A. Passant, and P. Kapanipathi. Twarql: tapping into the wisdom of the crowd. In Proceedings of the 6th International Conference on Semantic Systems, pages 3--5, 2010.
[54]
F. Morstatter, S. Kumar, H. Liu, and R. Maciejewski. Understanding Twitter data with TweetXplorer. In SIGKDD, pages 1482--1485, 2013.
[55]
P. Noordhuis, M. Heijkoop, and A. Lazovik. Mining Twitter in the cloud: A case study. In IEEE 3rd International Conference on Cloud Computing, pages 107--114, July 2010.
[56]
I. Ounis, C. Macdonald, J. Lin, and I. Soboroff. Overview of the TREC-2011 Microblog Track. In 20th Text REtrieval Conference (TREC), 2011.
[57]
A. Pak and P. Paroubek. Twitter as a corpus for sentiment analysis and opinion mining. In International Conference on Language Resources and Evaluation, pages 1320--1326, 2010.
[58]
Paul, M. J, and M. Dredze. In ICWSM, pages 265--272.
[59]
Plachouras and Y. Stavrakas. Querying term associations and their temporal evolution in social data. In International VLDB Workshop on Online Social Systems, 2012.
[60]
V. Plachouras, Y. Stavrakas, and A. Andreou. Assessing the coverage of data collection campaigns on Twitter: A case study. In On the Move to Meaningful Internet Systems: OTM 2013 Workshops, pages 598--607. 2013.
[61]
D. Preotiuc-Pietro, S. Samangooei, and T. Cohn. Trendminer : An architecture for real time analysis of social media text. In Workshop on RealTime Analysis and Mining of Social Streams, pages 4--7, 2012.
[62]
L. Ratinov and D. Roth. Design challenges and misconceptions in named entity recognition. In Conference on Computational Natural Language Learning (CoNLL), number June, pages 147--155, 2009.
[63]
A. Ritter, S. Clark, and O. Etzioni. Named entity recognition in tweets : an experimental study. In Conference on Empirical Methods in Natural Language Processing, pages 1524--1534, 2011.
[64]
R. Ronen and O. Shmueli. SoQL: A language for querying and creating data in social networks. In ICDE, pages 1595--1602, Mar. 2009.
[65]
T. Sakaki. Earthquake shakes twitter users : Real-time event detection by social sensors. In WWW, pages 851--860, 2010.
[66]
S. Salihoglu and J. Widom. GPS : A graph processing system. In International Conference on Scientific and Statistical Database Management, pages 1--31, 2013.
[67]
A. Schulz, A. Hadjakos, and H. Paulheim. A multiindicator approach for geolocalization of tweets. In ICWSM, pages 573--582, 2013.
[68]
A. Signorini, A. M. Segre, and P. M. Polgreen. The use of Twitter to track levels of disease activity and public concern in the U.S. during the influenza A H1N1 pandemic. PloS one, 6(5), Jan. 2011.
[69]
Y. Stavrakas and V. Plachouras. A platform for supporting data analytics on twitter challenges and objectives. Intl. Workshop on Knowledge Extraction & Consolidation from Social Media, (Ict 270239), 2013.
[70]
Tumasjan, Andranik, T. O. Sprenger, P. G. Sandner, and I. M. Welpe. Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment. In ICWSM, pages 178--185, 2010.
[71]
J. Weng, E.-p. Lim, and J. Jiang. TwitterRank : Finding topic-sensitive influential twitterers. In WSDM, pages 261--270, 2010.
[72]
J. S.White, J. N. Matthews, and J. L. Stacy. Coalmine: an experience in building a system for social media analytics. In I. V. Ternovskiy and P. Chin, editors, Proceedings of SPIE, volume 8408, 2012.
[73]
P. T. Wood. Query languages for graph databases. SIGMOD Record, 41(1):50--60, Apr. 2012.
[74]
S. Wu, J. M. Hofman, W. A. Mason, and D. J. Watts. Who says what to whom on twitter. In WWW, pages 705--714, Mar. 2011.
[75]
X. Yan, P. S. Yu, and J. Han. Graph indexing : A frequent structure-based approach. In SIGMOD, pages 335--346, 2004.
[76]
J. Yin, S. Karimi, B. Robinson, and M. Cameron. ESA: emergency situation awareness via microbloggers. In CIKM, pages 2701--2703, 2012.

Cited By

View all
  • (2024)Socializing IR: Turkish IR Scholars and their Twitter InteractionsAll Azimuth: A Journal of Foreign Policy and Peace10.20991/allazimuth.141658413:1(1-20)Online publication date: 24-Jan-2024
  • (2023)Fabricate the Auto-aquaculture Structure with Android Monitoring SystemInternational Journal of Advanced Network, Monitoring and Controls10.2478/ijanmc-2023-00498:1(83-91)Online publication date: 31-May-2023
  • (2023)Twitter as a predictive system: A systematic literature reviewJournal of Business Research10.1016/j.jbusres.2022.113561157(113561)Online publication date: Mar-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGKDD Explorations Newsletter
ACM SIGKDD Explorations Newsletter  Volume 16, Issue 1
Special issue on big data
June 2014
63 pages
ISSN:1931-0145
EISSN:1931-0153
DOI:10.1145/2674026
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 September 2014
Published in SIGKDD Volume 16, Issue 1

Check for updates

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)47
  • Downloads (Last 6 weeks)3
Reflects downloads up to 20 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Socializing IR: Turkish IR Scholars and their Twitter InteractionsAll Azimuth: A Journal of Foreign Policy and Peace10.20991/allazimuth.141658413:1(1-20)Online publication date: 24-Jan-2024
  • (2023)Fabricate the Auto-aquaculture Structure with Android Monitoring SystemInternational Journal of Advanced Network, Monitoring and Controls10.2478/ijanmc-2023-00498:1(83-91)Online publication date: 31-May-2023
  • (2023)Twitter as a predictive system: A systematic literature reviewJournal of Business Research10.1016/j.jbusres.2022.113561157(113561)Online publication date: Mar-2023
  • (2023)A Data Quality Multidimensional Model for Social Media AnalysisBusiness & Information Systems Engineering10.1007/s12599-023-00840-966:6(667-689)Online publication date: 10-Nov-2023
  • (2022)To Be or Not To Be: Twitter Presence among Turkish DiplomatsMGIMO Review of International Relations10.24833/2071-8160-2022-3-84-175-20115:3(175-201)Online publication date: 7-Jul-2022
  • (2022)Information networks for COVID-19 according to race/ethnicityInformation Technology and Management10.1007/s10799-022-00360-024:2(147-157)Online publication date: 23-Apr-2022
  • (2021)Use of Social Media Data in Disaster Management: A SurveyFuture Internet10.3390/fi1302004613:2(46)Online publication date: 12-Feb-2021
  • (2021)T-CREo: A Twitter Credibility Analysis FrameworkIEEE Access10.1109/ACCESS.2021.30606239(32498-32516)Online publication date: 2021
  • (2020)Predictive Analytical Model for Microblogging Data Using Asset Bubble ModellingInternational Journal of Cognitive Informatics and Natural Intelligence10.4018/IJCINI.202004010714:2(108-118)Online publication date: 1-Apr-2020
  • (2020)The Efficiency of Social Network Services Management in Organizations. An In-Depth Analysis Applying Machine Learning Algorithms and Multiple Linear RegressionsApplied Sciences10.3390/app1015516710:15(5167)Online publication date: 27-Jul-2020
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media