Skip to main content
Log in

Microblogs data management: a survey

  • Special Issue Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Microblogs data is the microlength user-generated data that is posted on the web, e.g., tweets, online reviews, comments on news and social media. It has gained considerable attention in recent years due to its widespread popularity, rich content, and value in several societal applications. Nowadays, microblogs applications span a wide spectrum of interests including targeted advertising, market reports, news delivery, political campaigns, rescue services, and public health. Consequently, major research efforts have been spent to manage, analyze, and visualize microblogs to support different applications. This paper gives a comprehensive review of major research and system work in microblogs data management. The paper reviews core components that enable large-scale querying and indexing for microblogs data. A dedicated part gives particular focus for discussing system-level issues and on-going effort on supporting microblogs through the rising wave of big data systems. In addition, we review the major research topics that exploit these core data management components to provide innovative and effective analysis and visualization for microblogs, such as event detection, recommendations, automatic geotagging, and user queries. Throughout the different parts, we highlight the challenges, innovations, and future opportunities in microblogs data research.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Notes

  1. https://developer.twitter.com/en/docs/api-reference-index.html.

  2. https://www.omnisci.com/demos/tweetmap/.

References

  1. Abdelhaq, H., Gertz, M., Armiti, A.: Efficient online extraction of keywords for localized events in Twitter. GeoInformatica 21(2), 365–388 (2017)

    Google Scholar 

  2. Abdelhaq, H., Sengstock, C., Gertz, M.: EvenTweet: online localized event detection from Twitter. In: VLDB (2013)

  3. Abdelsadek, Y., Chelghoum, K., Herrmann, F., Kacem, I.: Community extraction and visualization in social networks applied to Twitter. Inf. Sci. 424, 204–223 (2018)

    Google Scholar 

  4. Abreu, J., Castro, I., Martínez, C., Oliva, S., Gutiérrez, Y.: UCSC-NLP at SemEval-2017 Task 4: sense n-grams for sentiment analysis in Twitter. In: SemEval-2017 (2017)

  5. Agarwal, A., Xie, B., Vovsha, I., Rambow, O., Passonneau, R.: Sentiment analysis of Twitter data. In: LSM@ACL (2011)

  6. Agarwal, M.K, Bansal, D., Garg, M., Ramamritham, K.: Keyword search on microblog data streams: finding contextual messages in real time. In: EDBT (2016)

  7. Agarwal, M.K., Ramamritham, K., Bhide, M.: Real time discovery of dense clusters in highly dynamic graphs: identifying real world events in highly dynamic environments. PVLDB 5(10), 980–991 (2012)

    Google Scholar 

  8. After Boston Explosions, People Rush to Twitter for Breaking News. http://www.latimes.com/business/technology/la-fi-tn-after-boston-explosions-people-rush-to-twitter-for-breaking-news-20130415,0,3729783.story (2013)

  9. Ahmed, C., ElKorany, A.: Enhancing link prediction in Twitter using semantic user attributes. In: ASONAM, (2015)

  10. Ahn, Z., McLaughlin, M., Hou, J., Nam, Y., Hu, C.W., Park, M., Meng, J.: Social network representation and dissemination of pre-exposure prophylaxis (PrEP): a semantic network analysis of HIV prevention drug on Twitter. In: Springer SCSM (2014)

  11. Ahuja, A., Wei, W., Carley, K.M.: Microblog sentiment topic model. In: ICDM Workshops (2016)

  12. Akbari, M., Xia, H., Nie, L., Chua, T.S: From tweets to wellness: wellness event detection from Twitter streams. In: AAAIz (2016)

  13. Al-Olimat, H., Thirunarayan, K., Shalin, V.L., Sheth, A.P.: Location name extraction from targeted text streams using Gazetteer-based statistical language models. In: COLING (2018)

  14. Alawad, N.A., Aris, A., Stefano, L., Ida, M., Fabrizio, S.: Network-aware recommendations of novel tweets. In: SIGIR (2016)

  15. Alp, Z.Z., Ögüdücü, S.: Influential user detection on Twitter: analyzing effect of focus rate. In: ASONAM (2016)

  16. Alsaedi, N., Burnap, P., Rana, O.: Can we predict a riot? Disruptive event detection using Twitter. ACM TOIT 17(2), 18 (2017)

    Google Scholar 

  17. Alsaedi, N., Burnap, P., Rana, O.F.: Automatic summarization of real world events using Twitter. In: ICWSM (2016)

  18. Alsubaiee, S., Altowim, Y., Altwaijry, H., Behm, A., Borkar, V.R., Bu, Y., Carey, M.J., Cetindil, I., Cheelangi, M., Faraaz, K., Gabrielova, E., Grover, R., Heilbron, Z., Kim, Y.S., Li, C., Ok, J.M., Onose, N., Pirzadeh, P., Tsotras, V., Vernica, R., Wen, J., Westmann, T.: AsterixDB: a scalable, open source BDMS. PVLDB 7(14), 1905–1916 (2014)

    Google Scholar 

  19. Apache AsterixDB. http://asterixdb.apache.org/ (2018)

  20. Apache Cassandra. http://cassandra.apache.org/ (2018)

  21. Apache Flink. https://flink.apache.org/ (2018)

  22. Apache Ignite. https://ignite.apache.org/ (2018)

  23. Apache Impala. https://impala.apache.org/ (2018)

  24. Apache Spark. https://spark.apache.org/ (2014)

  25. Apache Spark Streaming. https://spark.apache.org/streaming/ (2018)

  26. Apache Storm. https://storm.apache.org/ (2014)

  27. Apple buys social media analytics firm Topsy Labs. www.bbc.co.uk/news/business-25195534 (2013)

  28. A Nobel Peace Prize for Twitter? www.csmonitor.com/Commentary/Opinion/2009/0706/p09s02-coop.html (2009)

  29. Ardon, S., Bagchi, A., Mahanti, A., Ruhela, A., Seth, A., Tripathy, R.M., Triukose, S.: Spatio-temporal and events based analysis of topic popularity in Twitter. In: CIKM (2013)

  30. Arslan, Y., Birturk, A., Djumabaev, B., Küçük, D.: Real-time Lexicon-based sentiment analysis experiments on Twitter with a mild (more information, less data) approach. In: IEEE Big Data (2017)

  31. Asiaee, A., Tepper, M., Banerjee, A., Sapiro, G.: If you are happy and you know it... Tweet. In: CIKM (2012)

  32. Avudaiappan, N., Herzog, A., Kadam, S., Du, Y., Thatche, J., Safro, I.: Detecting and summarizing emergent events in microblogs and social media streams by dynamic centralities. In: IEEE Big Data (2017)

  33. Babcock, B., Datar, M., Motwani, R.: Load shedding for aggregation queries over data streams. In: ICDE (2004)

  34. Bai, S., Hao, B., Li, A., Yuan, S., Gao, R., Zhu, T.: Predicting big five personality traits of microblog users. In: WI (2013)

  35. Bakliwal, A., Arora, P., Madhappan, S., Kapre, N., Singh, M., Varma, V.: Mining sentiments from tweets. In: WASSA@ACL (2012)

  36. Balikas, G.: TwiSe at SemEval-2017 Task 4: five-point Twitter sentiment classification and quantification. In: SemEval-2017 (2017)

  37. Balikas, G., Moura, S., Amini, M.R.: Multitask learning for fine-grained Twitter sentiment analysis. In: SIGIR (2017)

  38. Bansal, P., Jain, S., Varma, V.: Towards semantic retrieval of hashtags in microblogs. In: WWW Companion (2015)

  39. Barbosa, L., Feng, J.: Robust sentiment detection on Twitter from biased and noisy data. In: COLING (2010)

  40. Bartoletti, M., Lande, S., Massa, A.: Faderank: an incremental algorithm for ranking Twitter users. In: WISE (2016)

  41. Basu, M., Ghosh, K., Das, S., Dey, R., Bandyopadhyay, S., Ghosh, S.: Identifying post-disaster resource needs and availabilities from microblogs. In: ASONAM (2017)

  42. Basu, M., Shandilya, A., Ghosh, K., Ghosh, S.: Automatic matching of resource needs and availabilities in microblogs for post-disaster relief. In: WWW Companion (2018)

  43. Battle, L., Chang, R., Stonebraker, M.: Dynamic prefetching of data tiles for interactive visualization. In: SIGMOD (2016)

  44. Baugh, W.: Bwbaugh: hierarchical sentiment analysis with partial self-training. In: SemEval, vol. 2 (2013)

  45. Becker, L., Erhart, G., Skiba, D., Matula, V.: Avaya: sentiment analysis on twitter with self-training and polarity lexicon expansion. In: SemEval, vol. 2 (2013)

  46. Bermingham, A., Smeaton, A.F.: Classifying sentiment in microblogs: Is brevity an advantage? In: CIKM (2010)

  47. Bian, J., Yang, Y., Chua, T.S.: Multimedia summarization for trending topics in microblogs. In: CIKM (2013)

  48. Bisio, F., Meda, C., Zunino, R., Surlinelli, R., Scillia, E., Ottaviano, A.: Real-time monitoring of Twitter traffic by using semantic networks. In: ASONAM (2015)

  49. Bizid, I., Nayef, N., Boursier, N., Faïz, S., Doucet, A.: Identification of microblogs prominent users during events by learning temporal sequences of features. In: CIKM (2015)

  50. Budak, C., Georgiou, T., Agrawal, D., Abbadi, A.E.: GeoScope: online detection of geo-correlated information trends in social networks. In: VLDB (2014)

  51. Busch, M., Gade, K., Larson, B., Lok, P., Luckenbill, S., Lin, J.: Earlybird: real-time search at Twitter. In: ICDE (2012)

  52. Cai, H., Huang, Z., Srivastava, D., Zhang, Q.: Indexing evolving events from tweet streams. TKDE 27(11), 3001–3015 (2015)

    Google Scholar 

  53. Cao, C.C., She, J., Tong, Y., Chen, L.: Whom to ask? Jury selection for decision making tasks on micro-blog services. PVLDB 5(11), 1495–1506 (2012)

    Google Scholar 

  54. Cao, X., Cong, G., Guo, T., Jensen, C.S., Ooi, B.C.: Efficient processing of spatial group keyword queries. TODS 40(2), 13 (2015)

    MathSciNet  Google Scholar 

  55. Cao, X., Cong, G., Jensen, C.S., Ooi, B.C.: Collective spatial keyword querying. In: SIGMOD (2011)

  56. Cary, A., Wolfson, O., Rishe, N.: Efficient and scalable method for processing top-k spatial boolean queries. In: SSDBM (2010)

  57. Celik, I., Abel, F., Houben, G.J.: Learning semantic relationships between entities in Twitter. In: ICWE (2011)

  58. Chandrasekaran, S., Cooper, S., Deshpande, A., Franklin, M.J., Hellerstein, J.M., Hong, J.M., Krishnamurthy, S., Madden, S., Reiss, F., Shah, M.A.: TelegraphCQ: continuous dataflow processing. In: SIGMOD (2003)

  59. Chavan, H., Mokbel, M.F.: Scout: a GPU-aware system for interactive spatio-temporal data visualization. In: SIGMOD (2017)

  60. Chen, C., Li, F., Ooi, B.C., Wu, S.: TI: an efficient indexing mechanism for real-time search on tweets. In: SIGMOD (2011)

  61. Chen, C.C., Huang, H.H., Chen, H.H.: NLG301 at SemEval-2017 Task 5: fine-grained sentiment analysis on financial microblogs and news. In: SemEval (2017)

  62. Chen, F., Ji, R., Jinsong, S., Cao, D., Gao, Y.: Predicting microblog sentiments via weakly supervised multimodal deep learning. IEEE Trans. Multimed. 20(4), 997–1007 (2018)

    Google Scholar 

  63. Chen, L., Cong, G., Cao, X.: An efficient query indexing mechanism for filtering geo-textual data. In: SIGMOD (2013)

  64. Chen, L., Cong, G., Jensen, C.S., Wu, D.: Spatial keyword query processing: an experimental evaluation. In: VLDB (2013)

  65. Chen, L., Cui, Y., Cong, G., Cao, X.: SOPS: a system for efficient processing of spatial-keyword publish/subscribe. PVLDB 7(13), 1601–1604 (2014)

    Google Scholar 

  66. Chen, X., Li, L., Guandong, X., Yang, Z., Kitsuregawa, M.: Recommending related microblogs: a comparison between topic and WordNet based approaches. In: AAAI (2012)

  67. Chen, X., Sykora, M.D., Jackson, T.W., Elayan, S.: What about mood swings: identifying depression on Twitter with temporal measures of emotions. In: WWW Companion (2018)

  68. Cheng, D., Schretlen, P., Kronenfeld, N., Bozowsky, N., Wright, W.: Tile based visual analytics for Twitter big data exploratory analysis. In: IEEE Big Data (2013)

  69. Christoforaki, M., He, J., Dimopoulos, C., Markowetz, A., Suel, T.: Text versus space: efficient geo-search query processing. In: CIKM (2011)

  70. Clark, S., Wicentwoski, R.: SwatCS: combining simple classifiers with estimated accuracy. In: SemEval@NAACL-HLT (2013)

  71. Cliche, M.: BB\_twtr at SemEval-2017 Task 4: Twitter sentiment analysis with CNNs and LSTMs. arXiv:1704.06125 (2017)

  72. Cong, G., Jensen, C.S.: Querying geo-textual data: spatial keyword queries and beyond. In: SIGMOD (2016)

  73. Cong, G., Jensen, C.S., Dingming, W.: Efficient retrieval of the top-k most relevant spatial web objects. PVLDB 2(1), 337–348 (2009)

    Google Scholar 

  74. Constantin, C., Grossetti, Q., Mouza, Cé., Travers, N.: An homophily-based approach for fast post recommendation in microblogging systems. In: EDBT (2018)

  75. Corrêa  Jr. E.A., Marinho, V.Q., dos Santos, L.B.: Nilc-usp at SemEval-2017 Task 4: a multi-view ensemble for twitter sentiment analysis. arXiv:1704.02263 (2017)

  76. Counts, S., Fisher, K.: Taking it all in?. Visual attention in microblog consumption. In: ICWSM (2011)

  77. Cui, A., Zhang, M., Liu, Y., Ma, S.: Emotion tokens: bridging the gap among multilingual Twitter sentiment analysis. In: Asia Information Retrieval Symposium (2011)

  78. Cui, A., Zhang, M., Liu, Y., Ma, S., Zhang, K.: Discover breaking events with popular Hashtags in Twitter. In: CIKM (2012)

  79. da Silva, N.F.F., Hruschka, E.R., Hruschka Jr., E.R.: Tweet sentiment analysis with classifier ensembles. DSS J. 66, 170–179 (2014)

    Google Scholar 

  80. da Silva, N.F.F., Coletta, L.F.S., Hruschka, E.R.: A survey and comparative study of tweet sentiment analysis via semi-supervised learning. ACM Comput. Surv. 49(1), 15:1–15:26 (2016)

    Google Scholar 

  81. Dang, A., Makki, R., Moh’d, A., Islam, A., Keselj, V., Milios, E.E.: Real time filtering of tweets using Wikipedia concepts and google tri-gram semantic relatedness. In: TREC (2015)

  82. Davidov, D., Tsur, O., Rappoport, A.: Enhanced sentiment learning using Twitter Hashtags and Smileys. In: COLING (2010)

  83. de França Costa, D., da Silva, N.F.F.: INF-UFG at FiQA 2018 Task 1: predicting sentiments and aspects on financial tweets and news headlines. In: WWW Companion (2018)

  84. de Macedo, A.Q., Marinho, L.B., Santos, R.L.T.: Context-aware event recommendation in event-based social networks In: RecSys (2015)

  85. DeBrabant, J., Pavlo, A., Tu, S., Stonebraker, M., Zdonik, S.B.: Anti-caching: a new approach to database management system architecture. In: VLDB (2013)

  86. Deshmane, A.A., Friedrichs, J.: TSA-INF at SemEval-2017 Task 4: an ensemble of deep learning architectures including lexicon features for Twitter sentiment analysis. In: SemEval-2017 (2017)

  87. Dey, K., Shrivastava, R., Kaushik, S.: Twitter stance detection—a subjectivity and sentiment polarity inspired two-phase approach. In: ICDM Workshops (2017)

  88. Dey, K., Shrivastava, R., Kaushik, S., Subramaniam, L.V.: EmTaggeR: a word embedding based novel method for hashtag recommendation on Twitter. In: ICDM Workshops (2017)

  89. Ding, J., Dong, Y., Gao, T., Zhang, Z., Liu, Y.: Sentiment analysis of chinese micro-blog based on classification and rich features. In: Web Information Systems and Applications Conference (2016)

  90. Dong, L., Wei, F., Tan, C., Tang, D., Zhou, M., Xu, K.: Adaptive recursive neural network for target-dependent Twitter sentiment classification. In: ACL (2014)

  91. Doulamis, N.D., Doulamis, A.D., Kokkinos, P.C., Varvarigos, E.M.: Event detection in Twitter microblogging. IEEE Trans. Cybern. 46(12), 2810–2824 (2016)

    Google Scholar 

  92. Dovdon, E., Saias, J.: ej-sa-2017 at SemEval-2017 Task 4: experiments for target oriented sentiment analysis in Twitter. In: SemEval@ACL (2017)

  93. Drescher, C., Wallner, G., Kriglstein, S., Sifa, R., Drachen, A., Pohl, M.: What moves players? Visual data exploration of Twitter and Gameplay data. In: CHI (2018)

  94. Duong-Trung, N., Schilling, N., Schmidt-Thieme, L.: Near real-time geolocation prediction in Twitter streams via matrix factorization based regression. In: CIKM (2016)

  95. Dutt, R., Hiware, K., Ghosh, A., Bhaskaran, R.: SAVITR: a system for real-time location extraction from microblogs during emergencies. In: CoRR. arXiv:1801.07757 (2018)

  96. Dutta, S., Chandra, V., Mehra, K., Das, A.K., Chakraborty, T., Ghosh, S.: Ensemble algorithms for microblog summarization. IEEE Intell. Syst. 33(3), 4–14 (2018)

    Google Scholar 

  97. Effelsberg, W., Härder, T.: Principles of database buffer management. TODS 9(4), 560–595 (1984)

    Google Scholar 

  98. Efstathiades, C., Antoniou, H., Skoutas, D., Vassiliou, Y.: TwitterViz: visualizing and exploring the Twitter sphere. In: SSTD (2015)

  99. Ehsan, H., Sharaf, M.A., Chrysanthis, P.K.: MuVE: efficient multi-objective view recommendation for visual data exploration. In: ICDE (2016)

  100. Eldawy, A., Mokbel, M.F., Jonathan, C.: HadoopViz: a MapReduce framework for extensible visualization of big spatial data. In: ICDE (2016)

  101. Embrace of Social Media Aids Flood Victims in Kashmir. https://www.nytimes.com/2014/09/13/world/asia/embrace-of-social-media-aids-flood-victims-in-kashmir.html (2014)

  102. Enoki, M., Ikawa, Y., Raymond, R.: User community reconstruction using sampled microblogging data. In: WWW Companion (2012)

  103. Erdoğan, A.E., Yilmaz, T., Sert, O.C., Akyüz, M., Özyer, T., Alhajj, R.: From social media analysis to ubiquitous event monitoring: the case of Turkish tweets. In: ASONAM (2017)

  104. Facebook Statistics. http://newsroom.fb.com/company-info/ (2018)

  105. Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. In: PODS (2001)

  106. Faralli, S., Tommaso, G. Di Velardi, P.: Semantic enabled recommender system for micro-blog users. In: ICDM (2016)

  107. Feng, S., Song, K., Wang, D., Ge, Y.: A word-emoticon mutual reinforcement ranking model for building sentiment lexicon from massive collection of microblogs. WWW J. 18(4), 949–967 (2015)

    Google Scholar 

  108. Feng, W., Zhang, C., Zhang, W., Han, J., Wang, J., Aggarwal, C., Huang, J.: STREAMCUBE: hierarchical spatio-temporal hashtag clustering for event exploration over the Twitter stream. In: ICDE (2015)

  109. Forsati, R., Mahdavi, M., Shamsfard, M., Sarwat, M.: Matrix factorization with explicit trust and distrust side information for improved social recommendation. ACM Trans. Inf. Syst. 32(4), 17:1–17:38 (2014)

    Google Scholar 

  110. Ganesh, J., Gupta, M., Varma, V.: Interpretation of semantic tweet representations. In: ASONAM (2017)

  111. Gao, L., Wang, Y., Li, D., Shao, J., Song, J.: Real-time social media retrieval with spatial, temporal and social constraints. Neurocomputing 253, 77–88 (2017)

    Google Scholar 

  112. Gedik, B., Wu, K.L., Yu, P.S., Liu, L.: A load shedding framework and optimizations for M-way windowed stream joins. In: ICDE (2007)

  113. Genc, Y., Sakamoto, Y., Nickerson, J.V.: Discovering context: classifying tweets through a semantic transform based on Wikipedia. In: Springer FAC (2011)

  114. Ghanem, T., Magdy, A., Musleh, M., Ghani, S., Mokbel, M.: VisCAT: spatio-temporal visualization and aggregation of categorical attributes in Twitter data. In: SIGSPATIAL (2014)

  115. Ghiassi, M., Skinner, J., Zimbra, D.: Twitter brand sentiment analysis: a hybrid system using n-gram analysis and dynamic artificial neural network. Expert Syst. Appl. 40(16), 6266–6282 (2013)

    Google Scholar 

  116. Ghosh, S., Sharma, N.K., Benevenuto, F., Ganguly, N., Gummadi, P.K.: Cognos: Crowdsourcing search for topic experts in microblogs. In: SIGIR (2012)

  117. Giachanou, A., Crestani, F.: Like it or not: a survey of Twitter sentiment analysis methods. ACM Comput. Surv. 49(2), 28:1–28:41 (2016)

    Google Scholar 

  118. Gilani, Z., Kochmar, E., Crowcroft, J.: Classification of Twitter accounts into automated agents and human users. In: ASONAM (2017)

  119. Gillani, M., Ilyas, M.U., Saleh, S., Alowibdi, J.S., Aljohani, N.R., Alotaibi, F.S.: Post summarization of microblogs of sporting events. In: WWW Companion (2017)

  120. Go, A., Bhayani, R., Huang, L.: Twitter sentiment classification using distant supervision. Technical report, Stanford University (2009)

  121. Grover, R., Carey, M.: Data ingestion in AsterixDB. In: EDBT (2015)

  122. Gu, Y., Song, J., Liu, W., Zou, L., Yao, Y.: Context aware matrix factorization for event recommendation in event-based social networks. In: WI (2016)

  123. Guha, S., Chakraborty, T., Datta, S., Kumar, M., Varma, V.: TweetGrep: weakly supervised joint retrieval and sentiment analysis of topical tweets. In: ICWSM (2016)

  124. Guilherme, C.R., de Lemos, V.S., Lammel, F., Manssour, I.H., Silveira, M.S., Pase, A.F.: Visualization techniques for the analysis of Twitter users’ behavior. In: ICWSM (2013)

  125. Guo, L., Zhang, D., Li, G., Tan, K.L., Bao, Z.: Location-aware pub/sub system: when continuous moving queries meet dynamic event streams. In: SIGMOD (2015)

  126. Guo, L., Zhang, D., Wang, Y., Huayu, W., Cui, B., Tan, K.-L.: Co2: Inferring personal interests from raw footprints by connecting the offline world with the online world. ACM Trans. Inf. Syst. (TOIS) 36(3), 31 (2018)

    Google Scholar 

  127. Guo, T., Cao, X., Cong, G.: Efficient algorithms for answering the M-closest keywords query. In: SIGMOD (2015)

  128. Guo, T., Feng, K., Cong, G., Bao, Z.: Efficient selection of geospatial data on maps for interactive and visualized exploration. In: SIGMOD (2018)

  129. Gupta, P., Goel, A., Lin, J.J., Sharma, A., Wang, D., Zadeh, R.: WTF: the who to follow service at Twitter. In: WWW (2013)

  130. Gupta, P., Satuluri, V., Grewal, A., Gurumurthy, S., Zhabiuk, V., Li, Q., Lin, J.J.: Real-time Twitter recommendation: online Motif detection in large dynamic graphs. PVLDB 7(13), 1379–1380 (2014)

    Google Scholar 

  131. Hamdan, H., Béchet, F., Bellot, P.: Experiments with DBpedia, WordNet and SentiWordNet as resources for sentiment analysis in micro-blogging. In: SemEval@NAACL-HLT (2013)

  132. Hannon, J., Bennett, M., Smyth, B.: Recommending Twitter users to follow using content and collaborative filtering approaches. In: RecSys (2010)

  133. Hansu, G., Gartrell, M., Zhang, L., Lv, Q., Grunwald, D.: AnchorMF: towards effective event context identification. In: CIKM (2013)

  134. Hao, Y., Lan, Y., Li, Y., Li, C.: XJSA at SemEval-2017 Task 4: a deep system for sentiment classification in Twitter. In: SemEval-2017 (2017)

  135. Harvard Medical School Researchers Awarded Twitter Data Grant. https://hms.harvard.edu/news/harvard-medical-school-researchers-awarded-twitter-data-grant (2014)

  136. Hassan, A., Abbasi, A., Zeng, D.: Twitter sentiment analysis: a bootstrap ensemble framework. In: SocialCom (2013)

  137. He, L., Luo, J.: What makes a pro eating disorder Hashtag: using Hashtags to identify pro eating disorder Tumblr posts and Twitter users. In: IEEE Big Data (2016)

  138. He, Y., Barman, S., Naughton, J.F.: On load shedding in complex event processing. In: ICDT (2014)

  139. He, Y., Lin, C., Gao, W., Wong, K.F.: Tracking sentiment and topic dynamics from social media. In: ICWSM (2012)

  140. Health Department Use of Social Media to Identify Foodborne Illness—Chicago, Illinois, 2013–2014. https://www.cdc.gov/mmwr/preview/mmwrhtml/mm6332a1.htm (2014)

  141. Hecht, B.J., Hong, L., Suh, B., Chi, E.H.: Tweets from Justin Bieber’s heart: the dynamics of the location field in user profiles. In: CHI (2011)

  142. Hoang, T., Cher, P.H., Prasetyo, P.K., Lim, E.P.: Big data: crowdsensing and analyzing micro-event tweets for public transportation insights. In: IEEE (2016)

  143. Hong, L., Ahmed, A., Gurumurthy, S., Smola, A.J., Tsioutsiouliklis, K.: Discovering geographical topics in the Twitter stream. In: WWW (2012)

  144. How Facebook Is Transforming Disaster Response. https://www.wired.com/2016/11/facebook-disaster-response/ (2016)

  145. How Twitter, Facebook, WhatsApp And Other Social Networks Are Saving Lives During Disasters. http://www.huffingtonpost.in/2017/01/31/how-twitter-facebook-whatsapp-and-other-social-networks-are-sa_a_21703026/ (2017)

  146. Htait, A., Fournier, S., Bellot, P.: LSIS at SemEval-2017 Task 4: using adapted sentiment similarity seed words for English and Arabic tweet polarity classification. In: SemEval (2017)

  147. Hu, G., Bhargava, P., Fuhrmann, S., Ellinger, S., Spasojevic, N.: Analyzing users’ sentiment towards popular consumer industries and brands on Twitter. arXiv:1709.07434 (2017)

  148. Hu, Q., Pei, Y., Chen, Q., He, L.: SG++: Word representation with sentiment and negation for Twitter sentiment classification. In: SIGIR (2016)

  149. Hu, X., Tang, L., Liu, H.: Enhancing accessibility of microblogging messages using semantic knowledge. In: CIKM (2011)

  150. Hu, X., Tang, L., Tang, J., Liu, H.: Exploiting social relations for sentiment analysis in microblogging. In: WSDM (2013)

  151. Hu, Y., John, A., Wang, F., Kambhampati, S.: ET-LDA: joint topic modeling for aligning events and their Twitter feedback. In: AAAI, vol. 12 (2012)

  152. Hu, Y., Nian, T., Chen, C.: Mood congruence or mood consistency? examining aggregated Twitter sentiment towards Ads in 2016 super bowl. In: ICWSM (2017)

  153. Hua, T., Chen, F., Zhao, L., Chang-Tien, L., Ramakrishnan, N.: STED: semi-supervised targeted-interest event detectionin in Twitter. In: SIGKDD (2013)

  154. Hua, T., Chen, F., Zhao, L., Lu, C.-T., Ramakrishnan, N.: Automatic targeted-domain spatio-temporal event detection in Twitter. GeoInformatica 20(4), 765–795 (2016)

    Google Scholar 

  155. Hubert, R.B., Estevez, E., Maguitman, A.G., Janowski, T.: Examining government-citizen interactions on Twitter using visual and sentiment analysis. In: DG.O (2018)

  156. Hurricane Harvey Victims Turn to Twitter and Facebook. http://time.com/4921961/hurricane-harvey-twitter-facebook-social-media/ (2017)

  157. In Irma, Emergency Responders’ New Tools: Twitter and Facebook. https://www.wsj.com/articles/for-hurricane-irma-information-officials-post-on-social-media-1505149661 (2017)

  158. Ikawa, Y., Enoki, M., Tatsubori, M.: Location inference using microblog messages. In: WWW (2012)

  159. Itoh, M., Yokoyama, D., Toyoda, M., Tomita, Y., Kawamura, S., Kitsuregawa, M.: Visual exploration of changes in passenger flows and tweets on mega-city metro network. IEEE Trans. Big Data 2(1), 85–99 (2016)

    Google Scholar 

  160. Jabreel, M., Moreno, A.: SiTAKA at SemEval-2017 Task 4: sentiment analysis in twitter based on a rich set of features. In: SemEval (2017)

  161. Japan earthquake: how Twitter and Facebook helped. http://www.telegraph.co.uk/technology/twitter/8379101/Japan-earthquake-how-Twitter-and-Facebook-helped.html (2011)

  162. Jia, J., Li, C., Zhang, X., Li, C., Carey, M.J., Su, S.: Towards interactive analytics and visualization on one billion tweets. In: SIGSPATIAL (2016)

  163. Jiang, J., Lu, H., Yang, B., Cui, B.: Finding top-k local users in geo-tagged social media data. In: ICDE (2015)

  164. Jiang, L., Yu, M., Zhou, M., Liu, X., Zhao, T.: Target-dependent Twitter sentiment classification. In: ACL (2011)

  165. Jianqiang, Z., Xiaolin, G., Xuejun, Z.: Deep convolution neural networks for Twitter sentiment analysis. IEEE Access 6, 23253–23260 (2018)

    Google Scholar 

  166. Jo, Y., Oh, A.H: Aspect and sentiment unification model for online review analysis. In: WSDM (2011)

  167. Jonathan, C., Magdy, A., Mokbel, M.F., Jonathan, A.: GARNET: a holistic system approach for trending queries in microblogs. In: ICDE (2016)

  168. Jones, A.J., Carlson, E.: TwitterViz: a robotics system for remote data visualization. In: ICWSM (2013)

  169. Kallman, R., Kimura, H., Natkins, J., Pavlo, A., Rasin, A., Zdonik, S.B., Jones, E.P.C., Madden, S., Stonebraker, M., Zhang, Y., Hugg, J., Abadi, D.J.: H-store: a high-performance, distributed main memory transaction processing system. PVLDB 1(2), 1496–1499 (2008)

    Google Scholar 

  170. Kalyanam, J., Velupillai, S., Conway, M., Lanckriet, G.: From event detection to storytelling on microblogs. In: ASONAM (2016)

  171. Kaneko, T., Yanai, K.: Visual event mining from the Twitter stream. In: WWW Companion (2016)

  172. Karanasou, M., Ampla, A., Doulkeridis, C., Halkidi, M.: Scalable and real-time sentiment analysis of Twitter data. In: ICDM Workshops (2016)

  173. Kazai, G., Iskander, Y., Daoud, C.: Personalised news and blog recommendations based on user location, Facebook and Twitter user profiling. In: SIGIR (2016)

  174. Kempter, R., Sintsova, V., Musat, C.C., Pu, P.: EmotionWatch: visualizing fine-grained emotions in event-related tweets. In: ICWSM (2014)

  175. Khan, F.H., Bashir, S., Qamar, U.: TOM: Twitter opinion mining framework using hybrid classification scheme. DSS J. 57, 245–257 (2014)

    Google Scholar 

  176. Khatua, A., Khatua, A.: Cricket World Cup 2015: predicting user’s orientation through mix tweets on twitter platform. In: ASONAM (2017)

  177. Khuc, V.N., Shivade, C., Ramnath, R., Ramanathan, J.: SAC: towards building large-scale distributed systems for Twitter sentiment analysis. In: ACM (2012)

  178. Kim, A., Blais, E., Parameswaran, A.G., Indyk, P., Madden, S., Rubinfeld, R.: Rapid sampling for visualizations with ordering guarantees. PVLDB 8(5), 521–532 (2015)

    Google Scholar 

  179. Kim, E., Ihm, H., Myaeng, S.H.: Topic-based place semantics discovered from microblogging text messages. In: WWW Companion (2014)

  180. Kiritchenko, S., Zhu, X., Mohammad, S.M.: Sentiment analysis of short informal texts. JAIR 50, 723–762 (2014)

    Google Scholar 

  181. Kitazawa, T., Yui, M.: Query-based simple and scalable recommender systems with Apache Hivemall. In: RecSys (2018)

  182. Kolovou, A., Kokkinos, F., Fergadis, A., Papalampidi, P., Iosif, E., Malandrakis, N., Palogiannidi, E., Papageorgiou, H., Narayanan, S., Potamianos, A.: Tweester at SemEval-2017 Task 4: fusion of semantic-affective and pairwise classification models for sentiment analysis in Twitter. In: SemEval@ACL (2017)

  183. Kontopoulos, E., Berberidis, C., Dergiades, T., Bassiliades, N.: Ontology-based sentiment analysis of Twitter posts. Expert Syst. Appl. 40(10), 4065–4074 (2013)

    Google Scholar 

  184. Korenek, P., Simko, M.: Sentiment analysis on microblog utilizing appraisal theory. WWW J. 17(4), 847–867 (2014)

    Google Scholar 

  185. Kouloumpis, E., Wilson, T., Moore, J.D.: Twitter sentiment analysis: the good the bad and the OMG! In: ICWSM (2011)

  186. Kowald, D., Pujari, S.C., Lex, E.: Temporal effects on hashtag reuse in twitter: a cognitive-inspired hashtag recommendation approach. In: WWW (2017)

  187. Krumm, J., Horvitz, E.: Eyewitness: identifying local events via space-time signals in Twitter feeds. In: SIGSPATIAL (2015)

  188. Kumamoto, T., Suzuki, T., Wada, H.: Visualizing impression-based preferences of Twitter users. In: SCSM-HCI (2014)

  189. Kumar, A., Sebastian, T.M.: Sentiment analysis on Twitter. IJCSI 9(4), 372 (2012)

    Google Scholar 

  190. Kuramochi, T., Okada, N., Tanikawa, K., Hijikata, Y., Nishida, S.: Applying to Twitter networks of a community extraction method using intersection graph and semantic analysis. In: Springer HCI (2013)

  191. Lacic, E.: Real-time recommendations in a multi-domain environment. In: ACM HT (2016)

  192. Lacic, E., Kowald, D., Parra, D., Kahr, M., Trattner, C.: Towards a scalable social recommender engine for online marketplaces: the case of apache solr. In: WWW Companion (2014)

  193. Lahoti, P., De Francisci Morales, G., Gionis, A.: Finding topical experts in twitter via query-dependent personalized PageRank. In: ASONAM (2017)

  194. Laskari, N.K., Sanampudi, S.K.: TWINA at SemEval-2017 Task 4: Twitter sentiment analysis with ensemble gradient boost tree classifier. In: SemEval-2017 (2017)

  195. Lee, G., Lin, J., Liu, C., Lorek, A., Ryaboy, D.V.: The unified logging infrastructure for data analytics at Twitter. PVLDB 5(12), 1771–1780 (2012)

    Google Scholar 

  196. Lee, T., Park, J.W., Lee, S., Hwang, S.W., Elnikety, S., He, Y.: Processing and optimizing main memory spatial-keyword queries. PVLDB 9(3), 132–143 (2015)

    Google Scholar 

  197. Levandoski, J., Larson, P., Stoica, R.: Identifying hot and cold data in main-memory databases. In: ICDE (2013)

  198. Levandoski, J.J., Sarwat, M., Mokbel, M.F., Ekstrand, M.D.: RecStore: an extensible and adaptive framework for online recommender queries inside the database engine. In: EDBT (2012)

  199. Li, G., Hu, J., Feng, J., Tan, K.L.: Effective location identification from microblogs. In: ICDE (2014)

  200. Li, G., Wang, Y., Wang, T., Feng, J.: Location-aware publish/subscribe. In: KDD (2013)

  201. Li, J., Liao, M., Gao, W., He, Y., Wong, K.F.: Topic extraction from microblog posts using conversation structures. In: ACL (2016)

  202. Li, Q., Shah, S., Nourbakhsh, A., Fang, R., Liu, X.: funSentiment at SemEval-2017 Task 5: fine-grained sentiment analysis on financial microblogs using word vectors built from StockTwits and Twitter. In: SemEval (2017)

  203. Li, R., Lei, K.H., Khadiwala, R., Chang, K.C.C.: TEDAS: a Twitter-based event detection and analysis system. In: ICDE (2012)

  204. Li, Y., Jiang, J., Liu, T., Qiu, M., Sun, X.: Personalized microtopic recommendation on microblogs. ACM TIST 8(6), 77 (2017)

    Google Scholar 

  205. Li, Y., Bao, Z., Li, G., Tan, K.L.: Real time personalized search on social networks. In: ICDE (2015)

  206. Li, Z., Lee, K.C.K., Zheng, B., Lee, W.-C., Lee, D.L., Wang, X.: IR-Tree: an efficient index for geographic document search. TKDE 23(4), 585–599 (2011)

    Google Scholar 

  207. Lim, K.H., Lee, K.E., Kendal, D., Rashidi, L., Naghizade, E., Winter, S., Vasardani, M.: The grass is greener on the other side: understanding the effects of green spaces on Twitter user sentiments. In: WWW Companion (2018)

  208. Lin, J., Kolcz, A.: Large-scale machine learning at Twitter. In: SIGMOD (2012)

  209. Lin, J., Mishne, G.: A study of “Churn” in tweets and real-time search queries. In: ICWSM (2012)

  210. Lingad, J., Karimi, S., Yin, J.: Location extraction from disaster-related microblogs. In: WWW (2013)

  211. Lingkun, W., Lin, W., Xiao, X., Xu, Y.: LSII: An indexing structure for exact real-time search on microblogs. In: ICDE (2013)

  212. Liu, M., Fu, K., Lu, C.T., Chen, G., Wang, H.: A search and summary application for traffic events detection based on Twitter data. In: SIGSPATIAL (2014)

  213. Liu, N., Li, L., Guandong, X., Yang, Z.: Identifying domain-dependent influential microblog users: a post-feature based approach. In: AAAI (2014)

  214. Liu, S., Li, F., Li, F., Cheng, X., Shen, H.: Adaptive co-training SVM for sentiment classification on tweets. In: CIKM (2013)

  215. Liu, S., Zhu, W., Xu, N., Li, F., Cheng, X.Q., Liu, Y., Wang, Y.: Co-training and visualizing sentiment evolvement for tweet events. In: WWW (2013)

  216. Liu, X., Fu, Z., Wei, F., Zhou, M.: Collective nominal semantic role labeling for tweets. In: AAAI (2012)

  217. Liu, X., Li, K., Zhou, M., Xiong, Z.: Enhancing semantic role labeling for tweets using self-training. In: AAAI (2011)

  218. Liu, X., Li, Q., Nourbakhsh, A., Fang, R., Thomas, M., Anderson, K., Kociuba, R., Vedder, M., Pomerville, S., Wudali, R., et al.: Reuters tracer: a large scale system of detecting & verifying real-time news events from Twitter. In: CIKM (2016)

  219. Long, C., Wong, R.C.W., Wang, K., Fu, A.W.C.: Collective spatial keyword queries: a distance owner-driven approach. In: SIGMOD (2013)

  220. Lozić, D., Šarić, D., Tokić, I., Medić, Z., Šnajder, J.: TakeLab at SemEval-2017 Task 4: recent deaths and the power of nostalgia in sentiment analysis in Twitter. In: SemEval-2017 (2017)

  221. Lu, X., Li, P., Ma, H., Wang, S., Xu, A., Wang, B.: Computing and applying topic-level user interactions in microblog recommendation. In: SIGIR (2014)

  222. Ma, R., Zhang, Q., Wang, J., Cui, L., Huang, X.: Mention recommendation for multimodal microblog with cross-attention memory network. In: SIGIR (2018)

  223. Magdy, A., Alarabi, L., Al-Harthi, S., Musleh, M., Ghanem, T., Ghani, S., Mokbel, M.: Taghreed: a system for querying, analyzing, and visualizing geotagged microblogs. In: SIGSPATIAL (2014)

  224. Magdy, A., Alghamdi, R., Mokbel, M.F.: On main-memory flushing in microblogs data management systems. In: ICDE (2016)

  225. Magdy, A., Aly, A.M., Mokbel, M.F., Elnikety, S., He, Y., Nath, S., Aref, W.G.: GeoTrend: spatial trending queries on real-time microblogs. In: SIGSPATIAL (2016)

  226. Magdy, A., Mokbel, M.: Towards a microblogs data management system. In: MDM (2015)

  227. Magdy, A., Mokbel, M.: Microblogs data management and analysis (tutorial). In: ICDE (2016)

  228. Magdy, A., Mokbel, M.: Demonstration of kite: a scalable system for microblogs data management. In: ICDE (2017)

  229. Magdy, A., Mokbel, M.F., Elnikety, S., Nath, S., He, Y.: Mercury: a memory-constrained spatio-temporal real-time search on microblogs. In: ICDE (2014)

  230. Magdy, A., Mokbel, M.F., Elnikety, S., Nath, S., He, Y.: Venus: scalable real-time spatial queries on microblogs with adaptive load shedding. TKDE 28(2), 356–370 (2016)

    Google Scholar 

  231. Magdy, A., Musleh, M., Tarek, K., Alarabi, L., Al-Harthi, S., Elmongui, H.G., Ghanem, T.M., Ghani, S., Mokbel, M.F.: Taqreer: a system for spatio-temporal analysis on microblogs. IEEE Data Eng. Bull. 38(2), 68–76 (2015)

    Google Scholar 

  232. Magnuson, A., Dialani, V., Mallela, D.: Event recommendation using Twitter activity. In: RecSys (2015)

  233. Mahmood, A.R., Aref, W.G., Aly, A.M.: FAST: frequency-aware indexing for spatio-textual data streams. In: ICDE (2018)

  234. Mahmood, A.R., Aref, W.G., Aly, A.M., Tang, M.: Atlas: on the expression of spatial-keyword group queries using extended relational constructs. In: SIGSPATIAL (2016)

  235. Mahmud, J., Nichols, J., Drews, C.: Where is this tweet from? Inferring home locations of Twitter users. In: ICWSM(2012)

  236. Makki, R., de Carvalho, E.J., Soto, A.J., Brooks, S., de Oliveira, M.C.F., Milios, E.E., Minghim, R.: ATR-Vis: visual and interactive information retrieval for parliamentary discussions in Twitter. TKDD 12(1), 31–333 (2018)

    Google Scholar 

  237. Marcus, A., Bernstein, M.S., Badar, O., Karger, D.R., Madden, S., Miller, R.C.: Tweets as data: demonstration of TweeQL and TwitInfo. In: SIGMOD (2011)

  238. Marcus, A., Bernstein, M.S., Badar, O., Karger, D.R., Madden, S., Miller, R.C.: Twitinfo: aggregating and visualizing microblogs for event exploration. In: CHI (2011)

  239. McCullough, D., Lin, J., Macdonald, C., Ounis, I., McCreadie, R.M.C.: Evaluating real-time search over tweets. In: ICWSM (2012)

  240. McMinn, A.J., Tsvetkov, D., Yordanov, T., Patterson, A., Szk, R., Rodriguez Perez, J.A., Jose, J.M.: An interactive interface for visualizing events on Twitter. In: SIGIR (2014)

  241. Mei, Q., Xu, L., Wondra, M., Su, H., Zhai, C.: Topic sentiment mixture: modeling facets and opinions in weblogs. In: WWW (2007)

  242. Meij, E., Weerkamp, W., de Rijke, M.: Adding semantics to microblog posts. In: WSDM (2012)

  243. Metwally, A., Agrawal, D., Abbadi, A.E.: Efficient computation of frequent and top-k elements in data streams. In: ICDT (2005)

  244. Miranda-Jiménez, S., Graff, M., Tellez, E.S., Moctezuma, D.: INGEOTEC at SemEval 2017 Task 4: A B4MSA ensemble based on genetic programming for Twitter sentiment analysis. In: SemEval (2017)

  245. Mishne, G., Dalton, J., Li, Z., Sharma, A., Lin, J.: Fast data in the era of big data: Twitter’s real-time related query suggestion architecture. In: SIGMOD (2013)

  246. Mishne, G., Lin, J.: Twanchor text: a preliminary study of the value of tweets as anchor text. In: SIGIR (2012)

  247. Mohammad, S.: #Emotional tweets. In: *SEM@NAACL-HLT (2012)

  248. Mohammad, S., Kiritchenko, S., Zhu, X.: NRC-Canada: building the state-of-the-art in sentiment analysis of tweets. In: SemEval@NAACL-HLT (2013)

  249. Mokbel, M., Magdy, A.: Microblogs data management systems: querying, analysis, and visualization (tutorial). In: SIGMOD (2016)

  250. Mokbel, M.F., Aref, W.G.: SOLE: scalable on-line execution of continuous queries on spatio-temporal data streams. VLDB J. 17(5), 971–995 (2008)

    Google Scholar 

  251. Mokbel, M.F.H., Ahmed, A.M.M.: System and method for microblogs data management, provisionally filed in U.S. Patent and Trademark Office on August 31, 2015, Application number: 14/841299. http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PG01&p=1&u=/netahtml/PTO/srchnum.html&r=1&f=G&l=50&s1=20160070754.PGNR

  252. MongoDB. https://www.mongodb.com/ (2018)

  253. Mu, L., Jin, P., Zheng, L., Chen, E.H., Yue, L.: Lifecycle-based event detection from microblogs. In: WWW Companion (2018)

  254. Mulki, H., Haddad, H., Gridach, M., Babaoğlu, I.: Tw-StAR at SemEval-2017 Task 4: sentiment classification of Arabic tweets. In: SemEval-2017 (2017)

  255. Nasim, Z.: IBA-Sys at SemEval-2017 Task 5: fine-grained sentiment analysis on financial microblogs and news. In: SemEval (2017)

  256. New Enhanced Geo-targeting for Marketers. https://blog.twitter.com/2012/new-enhanced-geo-targeting-for-marketers (2012)

  257. New Study Quantifies Use of Social Media in Arab Spring. www.washington.edu/news/2011/09/12/new-study-quantifies-use-of-social-media-in-arab-spring/ (2011)

  258. Nodarakis, N., Sioutas, S., Athanasios K.T., Giannis, T.: Large scale sentiment analysis on Twitter with spark. In: EDBT Workshops (2016)

  259. One Million Tweet Map. http://onemilliontweetmap.com/ (2016)

  260. Ortega, R., Fonseca, A., Montoyo, A.: SSA-UO: unsupervised Twitter sentiment analysis. In: Joint Conference on Lexical and Computational Semantics (* SEM), vol. 2 (2013)

  261. Ozdikis, O., Senkul, P., Oguztüzün, H.: Semantic expansion of tweet contents for enhanced event detection in Twitter. In: ASONAM (2012)

  262. Pak, A., Paroubek, P.: Twitter as a corpus for sentiment analysis and opinion mining. In: LREC (2010)

  263. Park, Y., Cafarella, M.J., Mozafari, B.: Visualization-aware sampling for very large databases. In: ICDE (2016)

  264. Passant, A., Bojars, U., Breslin, J.G., Hastrup, T., Stankovic, M., Laublet, P.: An overview of SMOB 2: open, semantic and distributed microblogging. In: ICWSM (2010)

  265. Paul, D., Li, F., Teja, M.K., Yu, X., Frost, R.: Compass: spatio temporal sentiment analysis of US election what Twitter says! In: SIGKDD (2017)

  266. Penagos, C.R., Batalla, J.A., Codina-Filbà, J., Narbona, D.G., Grivolla, J., Lambert, P., Saurí, R.: FBM: combining lexicon-based ML and heuristics for social media polarities. In: SemEval@NAACL-HLT (2013)

  267. Peng, M., Zhu, J., Wang, H., Li, X., Zhang, Y., Zhang, X., Tian, G.: Mining event-oriented topics in microblog stream with unsupervised multi-view hierarchical embedding. TKDD 12(3), 38 (2018)

    Google Scholar 

  268. Phelan, O., McCarthy, K., Smyth, B.: Using Twitter to recommend real-time topical news. In: RecSys (2009)

  269. Popescu, A.M., Pennacchiotti, M.: Detecting controversial events from Twitter. In: CIKM (2010)

  270. Prediction, Optimization and Control for Information Propagation on Networks: A Differential Equation and Mass Transportation Based Approach. https://www.nsf.gov/awardsearch/showAward?AWD_ID=1620342 (2017)

  271. Presto. http://prestodb.io/ (2018)

  272. Public Health Emergency, Department of Health and Human Services. http://nowtrending.hhs.gov/ (2015)

  273. Qadir, A., Mendes, P.N., Gruhl, D., Lewis, N.: Semantic lexicon induction from Twitter with pattern relatedness and flexible term length. In: AAAI (2015)

  274. Qian, Y., Tang, J., Yang, Z., Huang, B., Wei, W., Carley, K.M.: A probabilistic framework for location inference from social media. In: CoRR. arXiv:1702.07281 (2017)

  275. Qiu, L., Lei, Q., Zhang, Z.: Advanced sentiment classification of Tibetan microblogs on smart campuses based on multi-feature fusion. IEEE Access 6, 17896–17904 (2018)

    Google Scholar 

  276. Rajendram, S.M., Mirnalinee, T.T., et al.: SSN\_MLRG1 at SemEval-2017 Task 4: sentiment analysis in Twitter using multi-kernel gaussian process classifier. In: SemEval (2017)

  277. Ramage, D., Dumais, S.T., Liebling, D.J.: Characterizing microblogs with topic models. In: ICWSM (2010)

  278. Ranganathan, J., Irudayaraj, A.S., Tzacheva, A.A.: Action rules for sentiment analysis on Twitter data using spark. In: ICDM Workshops (2017)

  279. Redis. https://redis.io/ (2018)

  280. Ren, Y., Zhang, Y., Zhang, M., Ji, D.: Context-sensitive Twitter sentiment classification using neural network. In: AAAI (2016)

  281. Ren, Y., Zhang, Y., Zhang, M., Ji, D.: Improving Twitter sentiment classification using topic-enriched multi-prototype word embeddings. In: AAAI (2016)

  282. Ribeiro, M.H., Calais, P.H., Santos, Y.A., Almeida, V.A.F., Meira, W. Jr.: Characterizing and detecting hateful users on Twitter. In: CoRR. arXiv:1803.08977 (2018)

  283. Rios, M., Lin, J.J.: Visualizing the “Pulse” of world cities on Twitter. In: ICWSM Citeseer (2013)

  284. Rios, R.A., Pagliosa, P.A., Ishii, R.P., de Mello, R.F.: TSViz: a data stream architecture to online collect, analyze, and visualize tweets. In: SAC (2017)

  285. Ritter, A., Etzioni, O., Clark, S., et al.: Open domain event extraction from Twitter. In: SIGKDD (2012)

  286. RocksDB. https://rocksdb.org/ (2018)

  287. Romero, S., Becker, K.: A framework for event classification in tweets based on hybrid semantic enrichment. Expert Syst. Appl. 118, 522–538 (2019)

    Google Scholar 

  288. Rozental, A., Fleischer, D.: Amobee at SemEval-2017 Task 4: deep learning system for sentiment detection on Twitter. arXiv:1705.01306 (2017)

  289. Rudra, K., Ghosh, S., Ganguly, N., Goyal, P., Ghosh, S.: Extracting situational information from microblogs during disaster events: a classification-summarization approach. In: CIKM (2015)

  290. Rudra, K., Goyal, P., Ganguly, N., Mitra, P., Imran, M.: Identifying sub-events and summarizing disaster-related information from microblogs. In: SIGIR (2018)

  291. Ryoo, K., Moon, S.: Inferring Twitter user locations with 10 km accuracy. In: WWW Companion (2014)

  292. Sakaki, T., Okazaki, M., Matsuo, Y.: Earthquake shakes Twitter users: real-time event detection by social sensors. In: WWW (2010)

  293. Sang, J., Lu, D., Xu, C.: A probabilistic framework for temporal user modeling on microblogs. In: CIKM (2015)

  294. Sankaranarayanan, J., Samet, H., Teitler, B.E., Lieberman, M.D., Sperling, J.: TwitterStand: news in tweets. In: SIGSPATIAL (2009)

  295. Sarwat, M.: Recdb: towards DBMS support for online recommender systems. In: Proceedings of the ACM SIGMOD/PODS PhD Symposium 2012, Scottsdale, AZ, USA, May 20, 2012, pp. 33–38 (2012)

  296. Sarwat, M., Avery, J.L., Mokbel, M.F.: A RecDB in action: recommendation made easy in relational databases. PVLDB 6(12), 1242–1245 (2013)

    Google Scholar 

  297. Sarwat, M., Avery, J.L., Mokbel, M.F.: RECATHON: a middleware for context-aware recommendation in database systems. In: MDM (2015)

  298. Sarwat, M., Moraffah, R., Mokbel, M.F., Avery, J.L.: Database system support for personalized recommendation applications. In: ICDE (2017)

  299. Satapathy, R., Guerreiro, C., Chaturvedi, I., Cambria, E.: Phonetic-based microtext normalization for Twitter sentiment analysis. In: ICDM Workshops (2017)

  300. Sharma, A., Jerry, J., Praveen, B., Brian, L., Jimmy, L.: GraphJet: real-time content recommendations at Twitter. In: VLDB, pp. 1281–1292 (2016)

    Google Scholar 

  301. Si, J., Mukherjee, A., Liu, B., Li, Q., Li, H., Deng, X.: Exploiting topic based Twitter sentiment for stock prediction. In: ACL, vol. 2 (2013)

  302. Sijtsma, B., Qvarfordt, P., Chen, F.: Tweetviz: visualizing tweets for business intelligence. In: SIGIR (2016)

  303. Singh, V.K., Gao, J.R.: Situation detection and control using spatio-temporal analysis of microblogs. In: WWW (2010)

  304. Sina Weibo, China Twitter, comes to rescue amid flooding in Beijing. http://thenextweb.com/asia/2012/07/23/sina-weibo-chinas-twitter-comes-to-rescue-amid-flooding-in-beijing/ (2012)

  305. Skovsgaard, A., Sidlauskas, D., Jensen, C.S.: Scalable top-k spatio-temporal term querying. In: ICDE (2014)

  306. Smith, K.S., McCreadie, R., Macdonald, C., Ounis, I.: Analyzing disproportionate reaction via comparative multilingual targeted sentiment in Twitter. In: ASONAM (2017)

  307. Soto, A.J., Brooks, S., Raheleh, M., Milios, E.E.: Twitter message recommendation based on user interest profiles. In: ASONAM (2016)

  308. Sparsity Models for Forecasting Spatio-Temporal Human Dynamics. https://www.nsf.gov/awardsearch/showAward?AWD_ID=1737770 (2017)

  309. Søgaard, A., Plank, B., Alonso, H.M.: Using frame semantics for knowledge extraction from Twitter. In: AAAI (2015)

  310. Song, K., Chen, L., Gao, W., Feng, S., Wang, D., Zhang, C.: Persentiment: a personalized sentiment classification system for microblog users. In: WWW Companion (2016)

  311. Sotiropoulos, D.N., Kounavis, C.D., Giaglis, G.M.: Semantically meaningful group detection within sub-communities of Twitter blogosphere: a topic oriented multi-objective clustering approach. In: ASONAM (2013)

  312. Soulier, L., Lynda, T., Gia-Hung, N.: Answering Twitter questions: a model for recommending answerers through social collaboration. In: CIKM (2016)

  313. Speriosu, M., Sudan, N., Upadhyay, S., Baldridge, J.: Twitter polarity classification with label propagation over lexical links and the follower graph. In: Workshop on Unsupervised Learning in NLP (2011)

  314. Steiger, E., Resch, B., Zipf, A.: Exploration of spatiotemporal and semantic clusters of Twitter data using unsupervised neural networks. IJGIS 30(9), 1694–1716 (2016)

    Google Scholar 

  315. Stonebraker, M., Weisberg, A.: The VoltDB main memory DBMS. IEEE Data Eng. Bull. 36(2), 21–27 (2013)

    Google Scholar 

  316. Sundararaman, D., Srinivasan, S.: Twigraph: discovering and visualizing influential words between Twitter profiles. In: Social Informatics (2017)

    Google Scholar 

  317. Symeonidis, S., Effrosynidis, D., Kordonis, J., Arampatzis, A.: DUTH at SemEval-2017 Task 4: a voting classification approach for Twitter sentiment analysis. In: SemEval (2017)

  318. Symeonidis, S., Kordonis, J., Effrosynidis, D., Arampatzis, A.: DUTH at SemEval-2017 Task 5: sentiment predictability in financial microblogging and news articles. In: SemEval (2017)

  319. Tabari, N., Seyeditabari, A., Zadrozny, W.: SentiHeros at SemEval-2017 Task 5: an application of sentiment analysis on financial tweets. In: SemEval (2017)

  320. Tan, C., Lee, L., Tang, J., Jiang, L., Zhou, M., Li, P.: User-level sentiment analysis incorporating social networks. In: SIGKDD (2011)

  321. Tang, D., Wei, F., Qin, B., Liu, T., Zhou, M.: Coooolll: a deep learning system for Twitter sentiment classification. In: SemEval@COLING (2014)

  322. Tang, D., Wei, F., Yang, N., Zhou, M., Liu, T., Qin, B.: Learning sentiment-specific word embedding for Twitter sentiment classification. In: ACL (2014)

  323. Thelwall, M., Buckley, K., Paltoglou, G.: Sentiment strength detection for the social web. JASIST 63(1), 163–173 (2012)

    Google Scholar 

  324. Topsy Analytics: Find the insights that matter. www.topsy.com (2014)

  325. Turet, J.G., Costa, A.P.C.S.: Big data analytics to improve the decision-making process in public safety: a case study in Northeast Brazil. In: Springer ICDSST (2018)

    Google Scholar 

  326. Tweet Complete Index. https://blog.twitter.com/engineering/en_us/a/2014/building-a-complete-tweet-index.html

  327. TweetTracker: track, analyze, and understand activity on Twitter. tweettracker.fulton.asu.edu/ (2014)

  328. Twitter and Informal Science Learning and Engagement. https://www.nsf.gov/awardsearch/showAward?AWD_ID=1438898 (2017)

  329. The Power of Images: A Computational Investigation of Political Mobilization via Social Media. https://www.nsf.gov/awardsearch/showAward?AWD_ID=1727459 (2017)

  330. Twitter Data Changing Future of Population Research. http://news.psu.edu/story/474782/2017/07/17/research/twitter-data-changing-future-population-research (2017)

  331. Twitter Statistics. https://about.twitter.com/company (2018)

  332. The Twitter War: Social Media’s Role in Ukraine Unrest. news.nationalgeographic.com/news/2014/05/140510-ukraine-odessa-russia-kiev-twitter-world/ (2014)

  333. Twitter a Big Winner in 2012 Presidential Election. https://www.computerworld.com/article/2493332/social-media/twitter-a-big-winner-in-2012-presidential-election.html (2012)

  334. Topsy Analytics for Twitter Political Index. https://blog.twitter.com/official/en_us/a/2012/a-new-barometer-for-the-election.html

  335. Understanding Social and Geographical Disparities in Disaster Resilience Through the Use of Social Media. https://www.nsf.gov/awardsearch/showAward?AWD_ID=1620451 (2017)

  336. Vesdapunt, N., Garcia-Molina, H.: Identifying users in social networks with limited information. In: ICDE (2015)

  337. Vo, D.T., Zhang, Y.: Target-dependent Twitter sentiment classification with rich automatic features. In: IJCAI (2015)

  338. VoltDB. https://www.voltdb.com/ (2018)

  339. Vosecky, J., Jiang, D., Leung, K.W.-T., Xing, K., Ng, W.: Integrating social and auxiliary semantics for multifaceted topic modeling in Twitter. ACM TOIT 14(4), 271–2724 (2014)

    Google Scholar 

  340. Vydiswaran, V.G.V., Romero, D.M., Zhao, X., Yu, D., Gomez-Lopez, I.N., Lu, J.X., Iott, B., Baylin, A., Clarke, P., Berrocal, V.J., et al.: “Bacon Bacon Bacon”: food-related tweets and sentiment in metro detroit. In: ICWSM (2018)

  341. Wakamiya, S., Jatowt, A., Kawai, Y., Akiyama, T.: Analyzing global and pairwise collective spatial attention for geo-social event detection in microblogs. In: WWW Companion (2016)

  342. Wang, M., Chu, B., Liu, Q., Zhou, X.: YNUDLG at SemEval-2017 Task 4: A GRU-SVM model for sentiment classification and quantification in Twitter. In: SemEval-2017 (2017)

  343. Wang, X., Zhang, Y., Zhang, W., Lin, X., Wang, W.: AP-Tree: efficiently support continuous spatial-keyword queries over stream. In: ICDE (2015)

  344. Wang, X., Wei, F., Liu, X., Zhou, M., Zhang, M.: Topic sentiment analysis in twitter: a graph-based hashtag sentiment classification approach. In: CIKM (2011)

  345. Wang, Y., Liu, J., Huang, Y., Feng, X.: Using hashtag graph-based topic model to connect semantically-related words without co-occurrence in microblogs. TKDE 28(7), 1919–1933 (2016)

    Google Scholar 

  346. Wang, Y., Siriaraya, P., Nakaoka, Y., Sakata, H., Kawai, Y., Akiyama, T.: A Twitter-based culture visualization system by analyzing multilingual geo-tagged tweets. In: ICADL (2018)

  347. Wang, Z., Zhang, Y., Li, Y., Wang, Q., Xia, F.: Exploiting social influence for context-aware event recommendation in event-based social networks. In: INFOCOM (2017)

  348. Watanabe, K., Ochi, M., Okabe, M., Onai, R.: Jasmine: a real-time local-event detection system based on geolocation information propagated to microblogs. In: CIKM (2011)

  349. Weber, I., Garimella, V.R.K.: Visualizing user-defined, discriminative geo-temporal Twitter activity. In: ICWSM (2014)

  350. Welch, M.J., Schonfeld, U., He, D., Cho, J.: Topical semantics of Twitter links. In: WSDM (2011)

  351. Wu, F., Huang, Y.: Personalized microblog sentiment classification via multi-task learning. In: AAAI (2016)

  352. Wu, S., Gong, L., Rand, W., Raschid, L.: Making recommendations in a microblog to improve the impact of a focal user. In: RecSys (2012)

  353. Wu, X., Bartram, L., Shaw, C.: Plexus: an interactive visualization tool for analyzing public emotions from Twitter data. In: CoRR. arXiv:1701.06270 (2017)

  354. Wu, Y.: Language E-learning based on learning analytics in big data era. In: International Conference on Big Data and Education (2018)

  355. Xiang, B., Zhou, L.: Improving Twitter sentiment analysis with topic-based mixture modeling and semi-supervised training. In: ACL, vol. 2 (2014)

  356. Xie, Q., Zhang, X., Zhixu, L., Zhou, X.: Optimizing cost of continuous overlapping queries over data streams by filter adaption. TKDE 28(5), 1258–1271 (2016)

    Google Scholar 

  357. Xing, C., Wang, Y., Liu, J., Huang, Y., Ma, W.Y.: Hashtag-based sub-event discovery using mutually generative LDA in Twitter. In: AAAI, pp. 2666–2672 (2016)

  358. Xiong, X., Mokbel, M.F., Aref, W.G.: SEA-CNN: scalable processing of continuous K-nearest neighbor queries in spatio-temporal databases. In: ICDE (2005)

  359. Yang, T.H., Tseng, T.H., Chen, C.P.: deepSA at SemEval-2017 Task 4: interpolated deep neural networks for sentiment analysis in Twitter. In: SemEval (2017)

  360. Yao, J., Cui, B., Xue, Z., Liu, Q.: Provenance-based indexing support in micro-blog platforms. In: ICDE (2012)

  361. Yen, A.Z., Huang, H.H., Chen, H.H.: Detecting personal life events from Twitter by multi-task LSTM. In: WWW Companion (2018)

  362. Yin, H., Cui, B., Chen, L., Hu, Z., Zhang, C.: Modeling location-based user rating profiles for personalized recommendation. TKDD 9(3), 191–1941 (2015)

    Google Scholar 

  363. Yin, Y., Song, Y., Zhang, M.: NNEMBs at SemEval-2017 Task 4: neural Twitter sentiment classification: a simple ensemble method with different embeddings. In: SemEval (2017)

  364. Yang, X.W., Yu, Z.: Xinjie: user embedding for scholarly microblog recommendation. In: ACL, vol. 2 (2016)

  365. Zhiwen, Y., Wang, Z., Chen, L., Guo, B., Li, W.: Featuring, detecting, and visualizing human sentiment in Chinese micro-blog. TKDD 10(4), 48 (2016)

    Google Scholar 

  366. Zayer, M.A., Gunes, M.H.: Analyzing the use of Twitter to disseminate visual impairments awareness information. In: ASONAM (2017)

  367. Zhang, C., Lei, D., Yuan, Q., Zhuang, H., Kaplan, L., Wang, S., Han, J.: GeoBurst+: effective and real-time local event detection in geo-tagged tweet streams. ACM TIST 9(3), 34 (2018)

    Google Scholar 

  368. Zhang, C., Liu, L., Lei, D., Yuan, Q., Zhuang, H., Hanratty, T., Han, J.: Triovecevent: embedding-based online local event detection in geo-tagged tweet streams. In: SIGKDD (2017)

  369. Zhang, C., Zhou, G., Yuan, Q., Honglei Z., Yu., Z., Lance K., Wang, S., Han, J.: Geoburst: real-time local event detection in geo-tagged tweet streams. In: SIGIR (2016)

  370. Zhang, D., Liu, Y., Lawrence, R.D., Chenthamarakshan, V.: Transfer latent semantic learning: microblog mining with less supervision. In: AAAI (2011)

  371. Zhang, D., Chan, C.Y., Tan, K.L.: Processing spatial keyword query as a top-k aggregation query. In: SIGIR (2014)

  372. Zhang, D., Nie, L., Luan, H., Tan, K.-L., Chua, T.-S., Shen, H.T.: Compact indexing and judicious searching for billion-scale microblog retrieval. ACM TOIS 35(3), 27 (2017)

    Google Scholar 

  373. Zhang, D., Tan, K.L., Tung, A.K.H.: Scalable top-k spatial keyword search. In: EDBT (2013)

  374. Zhang, H., Chen, G., Ooi, B.C., Wong, W.F., Wu, S., Xia, Y.: “Anti-caching”-based elastic memory management for big data. In: ICDE (2015)

  375. Zhang, J., Zhang, R., Sun, J., Zhang, Y., Zhang, C.: TrueTop: a sybil-resilient system for user influence measurement on Twitter. IEEE/ACM TON 24(5), 2834–2846 (2016)

    Google Scholar 

  376. Zhang, L., Ghosh, R., Dekhil, M., Hsu, M., Liu, B.: Combining lexicon-based and learning-based methods for Twitter sentiment analysis. HP Laboratories, Technical Report HPL-2011, p. 89 (2011)

  377. Zhang, Y., Szabo, C., Sheng, Q.Z., Fang, X.S.: SNAF: observation filtering and location inference for event monitoring on Twitter. WWW J. 21(2), 311–343 (2018)

    Google Scholar 

  378. Zhang, Y., Fan, Y., Ye, Y., Li, X., Winstanley, E.: Utilizing social media to combat opioid addiction epidemic: automatic detection of opioid users from Twitter. In: AAAI Workshops (2018)

  379. Zhang, Z., Lan, M.: Estimating semantic similarity between expanded query and tweet content for microblog retrieval. In: TREC (2014)

  380. Zhao, J., Lan, M., Zhu, T.: ECNU: expression-and message-level sentiment orientation classification in Twitter using multiple effective features. In: SemEval (2014)

  381. Zhao, J., Gui, X., Tian, F.: A new method of identifying influential users in the micro-blog networks. IEEE Access 5, 3008–3015 (2017)

    Google Scholar 

  382. Zhao, J., Lui, J.C.S., Towsley, D., Wang, P., Guan, X.: Sampling design on hybrid social-affiliation networks. In: ICDE (2015)

  383. Zhao, L., Chen, F., Chang-Tien, L., Ramakrishnan, N.: Online spatial event forecasting in microblogs. ACM TSAS 2(4), 15 (2016)

    Google Scholar 

  384. Zhao, W.X., Guo, Y., He, Y., Jiang, H., Wu, Y., Li, X.: We know what you want to buy: a demographic-based system for product recommendation on microblogs. In: KDD (2014)

  385. Zhao, W.X., Sui, L., Yulan, H., Chang, E.Y., Ji-Rong, W., Li, X.: Connecting social media to e-commerce: cold-start product recommendation using microblogging information. TKDE 28(5), 1147–1159 (2016)

    Google Scholar 

  386. Zheng, X., Sun, A., Wang, S., Han, J.: Semi-supervised event-related tweet identification with dynamic keyword generation. In: CIKM (2017)

  387. Zhou, D., Chen, L., He, Y.: An unsupervised framework of exploring events on Twitter: filtering, extraction and categorization. In: AAAI (2015)

  388. Zhou, D., Gao, T., He, Y.: Jointly event extraction and visualization on Twitter via probabilistic modelling. In: ACL, vol. 1 (2016)

  389. Zhou, X., Chen, L.: Event detection over Twitter social media streams. PVLDB 23(3), 381–400 (2014)

    MathSciNet  Google Scholar 

  390. Zhou, Y., Cristea, A.I., Shi, L.: Connecting targets to tweets: semantic attention-based model for target-specific stance detection. In: WISE (2017)

  391. Zhu, R., Wang, B., Yang, X., Zheng, B., Wang, G.: SAP: improving continuous top-K queries over streaming data. In: ICDE (2018)

  392. Zhu, X., Huang, J., Zhu, S., Chen, M., Zhang, C., Li, Z., Dongchuan, H., Chengliang, Z., Li, A., Jia, Y.: NUDTSNA at TREC 2015 microblog track: a live retrieval system framework for social network based on semantic expansion and quality model. In: TREC (2015)

  393. Zini, T., Becker, K., Dias, M.: INF-UFRGS at SemEval-2017 Task 5: a supervised identification of sentiment score in tweets and headlines. In: SemEval (2017)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Amr Magdy.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work is partially supported by the National Science Foundation, USA, under Grants IIS-1849971, SES-1831615, and CNS-1837577.

Appendices

Appendix

Orthogonal research directions

This appendix gives an overview about sentiment and semantic analysis in microblogs as an example of an orthogonal research direction, from the natural language processing literature, that does not exploit much of the data management infrastructures. The appendix highlights the differences of new techniques on microlength data with the corresponding techniques on traditional long data. For detailed surveys about these topics, the reader can refer to [80, 117].

1.1 Sentiment analysis

Sentiment analysis automatically discovers the polarity of feelings expressed in a chunk of text, e.g., a citizen posts positive or negative opinions about certain election candidate. Traditional sentiment analysis techniques make use of the microblogs brevity to enhance the classification accuracy of user sentiment. As reported in [46], using a traditional sentiment classification technique on microblogs boosts the accuracy up to 10% higher for binary sentiment. This boost is an absolute advantage of the content brevity that makes it less confusing and more decisive to catch positive and negative feelings in user-generated content. However, microblogs brevity introduces both challenges and differences compared with traditional data. For example, feature extraction is more challenging due to lots of abbreviations and noise, e.g., extracting meaningful keywords is harder. In addition, compared with traditional data where sentiment is analyzed on three different level, document level, sentence level, and entity level, microblogs short content mostly limits the sentiment scope to a single sentence or a single entity that represents the whole microdocument. Moreover, microblogs come with additional advantageous features that were not available in traditional data, such as links, user information, and their interactions with different topics. Thus, the sentiment analysis research on microblogs has addressed a wider variety of challenges compared with traditional sentiment analysis. In this section, we give an overview about this rich literature.

Fig. 15
figure 15

An overview of microblogs sentiment analysis literature

Figure 15 depicts an overview of the microblogs sentiment analysis literature. The major techniques can be categorized into four main categories, namely, machine learning techniques, lexical techniques, hybrid techniques, and miscellaneous techniques. The machine learning techniques represent the majority of techniques in the literature. It could be further categorized into four sub-categories as depicted in Fig. 15, namely, supervised, classifier ensemble, deep learning, and semi-supervised. The first sub-category of techniques use supervised machine learning, i.e., traditional classifiers [4, 36, 67, 87, 202, 220, 254, 255, 276, 281, 318, 351, 393]. The differences among these techniques are the classifier type, stages, and features used to distinguish sentiment. The major used classifiers are support vector machines (SVM) [4, 5, 31, 35, 39, 83, 87, 120, 131, 160, 164, 172, 180, 248, 262, 275, 306], (multinomial) naive Bayes (MNB and NB) [31, 35, 120, 131, 247, 262], k-nearest neighbor (kNN) [31, 82], MaxEnt [92, 120, 180], random forest (RF) [89, 319], logistic regression [36, 202, 393], and AdaBoost [185]. The used features include different types of language-based features such as unigrams [5, 35, 120, 164, 247, 262], bigrams [35, 120, 262], trigrams [262], n-grams [82, 180, 185, 248], and POS tags [5, 39, 120, 180, 185, 247, 248, 262], microblog-specific features [185, 248] such as retweets [39], hashtags [35, 39, 164], emotions [35, 39, 164, 180], links [35, 39], and other features such as punctuation-based [5, 82, 180, 248], pattern-based [5, 35, 82, 164, 180, 248] and semantic-based [180, 247].

To enhance the classification accuracy, techniques of the second sub-category ensemble multiple classifiers [61, 70, 75, 79, 86, 136, 172, 185, 194, 208, 244, 266, 317, 363]. The set of used features is almost identical to the single classifier techniques, while the used classification algorithms are overlapping but not identical. In specific, SVM [79, 136, 266], NB [70, 136], MNB [79], logistic regression [79, 136, 208], and AdaBoost [185] are still used, while new classifiers are also introduced such as neural models [86, 136, 363] and Bayes network [136]. A third sub-category is deep learning, which is an emerging field in machine learning. In the past few years, deep learning is getting increasing popularity and many learning problems migrated to deep learning frameworks. Deep learning offers a black box of neural networks that are trained with huge amounts of data that offer better accuracy over traditional classifiers. In the case of microblog platforms, huge amount of data is generated daily, which has motivated the use of deep learning techniques for sentiment analysis on microblogs. Existing deep learning techniques is exploited in short textual contexts in two-step fashion [37, 62, 71, 86, 90, 134, 147, 148, 165, 265, 280, 288, 321, 322, 337, 342, 359]. It first learns word embeddings, and then, it applies them to produce representations for the text sentiment.

The main limitation of all supervised techniques, either with single classifier, multiple classifiers, or deep learning models, is the sensitivity to dataset size. For increasing their performance, there is a high reliance on the manually annotated labels which is extremely expensive. To alleviate this problem, distance supervision has been employed where the labels are generated based on the emoticons and hashtags [258, 310]. However, this approach did not perform well. This encouraged a fourth sub-category of semi-supervised techniques to rise. The semi-supervised techniques rely on both a small set of manually annotated data as well as unlabeled data to train the model. They can be further divided into three main types as depicted in Fig. 15: graph-based, wrapper-based, and topic-based techniques. The graph-based techniques [77, 313, 320, 344] use label propagation to label the unlabeled training data based on the similarity metric between two nodes in the graph. Then, a classifier is trained and used as previous techniques. The wrapper-based techniques [44, 45, 214, 215, 380] rely either on self-training [44, 45, 380] or co-training [214, 215]. In both types, the classification process is an iterative process, starting with the initial labeled data, classify the other unlabeled data, and use the high confident ones in the next iteration of the classification till all data is labeled or it hits the maximum number of iterations. The difference between the self-training and the co-training is that in self-training only one classifier is used, whereas in the co-training two classifiers with different feature sets are used to provide two different views for the data. The more confident classification within the two classifiers is chosen to be within the labeled data in the next iteration. The last semi-supervised types are topic-based techniques [11, 123, 139, 166, 241, 301, 355], where topic information is extracted with sentiment analysis simultaneously under the observation that the context of the content affects the sentiment.

The second major category in Fig. 15 is lexical techniques [30, 146, 152, 207, 260, 278, 299, 323, 340], where a predefined list of positive and negative words is employed to classify the sentiment of the new microblog. There are two main sub-categories in lexical techniques, namely dictionary-based and corpus-based. The dictionary-based techniques [30, 146, 152, 207, 299, 340] use dictionaries as lexical resources and approximate lexical matching techniques are used to account for microblogs noise and abbreviations. The corpus-based technique [278] uses statistical or semantic methods to match incoming data with existing lexical resources. The third major category in Fig. 15 is hybrid techniques [107, 115, 175, 177, 182, 189, 376] that combine both machine learning and lexical methods to detect microblogs sentiment. These techniques use lexical terms either to train a machine learning model or to filter data in a first stage that is fed to a classifier for further processing on a second stage.

Other miscellaneous techniques are proposed for microblogs sentiment analysis. ConSent [183] uses concept analysis to determine sentiment based on associated topic. AppSent [184] uses appraisal terms to outperform supervised techniques. SocioSent [150] uses sociological information in the supervised learning process to improve the performance. ChineSentiment [365] proposes a rule-based model for analyzing sentiment features of different linguistic components, and a corresponding methodology for calculating sentiment using emoticon elements as auxiliary affective factors.

Fig. 16
figure 16

An overview of microblogs semantic analysis literature

1.2 Semantic analysis

Semantic analysis is a popular analysis task that is widely used in microblogs literature for different applications, such as topic modeling [339], knowledge extraction [309], community detection [190, 311], stance detection [390], sentiment analysis [182], event analysis [48, 261], effective microblog retrieval and ranking [379], and user recommendations [106]. This task automates discovering the meanings of a chunk of text by discovering semantic relationships that relate to real-world entities, such as places, persons, and organizations. For example, a text like Trump to campaign for Cindy Hyde-Smith in Mississippi can be related to two persons, Trump and Cindy Hyde-Smith, and one place, Mississippi. This relatedness connects the input text to a predefined set of semantic concepts or categories that are commonly extracted from human-contributed content, such as Wikipedia, or professionally maintained ontologies, such as FOAF and DBpedia ontologies. Such type of analysis used to be performed on long chunks of text, e.g., news articles, blog posts, or web documents. However, in microblogs, the textual content is very short and contains a plenty of abbreviations, informality, and noisy terms. Such brevity hurts the performance of traditional semantic analysis techniques, as shown in [242], that depend on lexical matching and search-based retrieval in, for example, Wikipedia concepts.

To overcome the brevity problem, a general theme of semantic analysis research on microblogs is exploring different ways to enrich the microblogs short textual content to enable accurate semantic relations discovery. Existing techniques in the literature can be categorized into four categories, as depicted in Fig. 16, based on the source of enrichment through: external documents-based techniques, machine learning-based techniques, hashtag-based techniques, and lexical techniques. Techniques of the first category [57, 113, 339, 350] depend on linking the microblog short document to external long documents, e.g., news articles or web documents, which allow traditional semantic analysis techniques to be applied with high precision. ToSem [339] performs semantic enrichment based on explicit web links that are included in the microblog to associate the linked web document. Then, it extracts both named entities and top-k terms from the web document to be appended to the microblog as auxiliary terms. NwSem [57] identifies online news articles that are related to the microblog post in order to extract named entities and include them in the user profile as semantic tags. UsrSem [350] explores semantics of user interactions, specifically retweets and links that are embedded in tweets, and their role in inferring notions such as quality of user relationships, trust, and other attributes of user relationships. This could be applied to re-ranking microblogs based on importance, user interest, quality, etc. DiSem [113] maps microblog posts to Wikipedia articles, then use the Wikipedia ontology for semantic categorization.

The second category is machine learning-based techniques [110, 149, 216, 217, 242, 287, 311, 370] that use either: (1) clustering to group different related microblogs and use their collective content to semantically label the whole cluster, or (2) classification that exploits annotated training data as an external source of information to learn different semantic classes of new microblogs. TrSem [370] introduces a novel transfer learning approach, namely transfer latent semantic learning, that utilizes a large number of tagged documents with rich information from domain-specific sources to discover latent semantics of the abbreviated text. AccSem [149] clusters related microblogs and use the collective content of each cluster to automatically assign semantically meaningful labels. The semantic labels are solicited from external knowledge sources, such as Wikipedia and WordNet, based on informative fragments parsed from microblogs contents. AdSem [242] uses SVM and naive Bayes classifiers to enhance the precision of mapping tweets to Wikipedia-based concepts. For this, it obtains an initial ranked list of candidate concepts through lexical matching, language modeling, and traditional techniques. Then, annotated training data is used to train classifiers that further classify microblogs based on different feature vectors to the correct semantic category, which significantly boosts both precision and recall. NomSem [216] uses SVM classifiers to identify nominal predicates in tweets. Then, a factor graph for each nominal predicate is constructed and joined with graphs of other predicates so their semantic arguments are jointly resolved. ComSem [311] clusters related microblogs to detect user groups within sub-communities. Then, a probabilistic model is employed to measure the semantic, or topical, coherence of the user group and filter out non-coherent groups. GeoSem [314] clusters microblogs based on spatial, temporal, and semantic features, including LDA topics, to evaluate the performance of combining different features in retrieving insights from microblogs data. ST-SRL [217] proposes a semi-supervised self-training approach that utilizes a small training dataset to label unlabeled tweets in an iterative way to increase the training dataset size. Labeled data records with highest confidence from two different labelers are used to enhance the classification accuracy in the following iterations. VecSem [110] has performed a unique study that explores the effect of changing microblog-specific semantic representation features on the performance of semantic prediction. It studies a set of 13 microblog-specific prediction tasks to understand both textual and social aspects of different representations.

The third category is hashtag-based techniques [38, 264, 345]. Hashtags are user-defined tags included in microblog posts, which indicate the discussed topics and enable posts related to the same topics to be searched easily. These hashtags are used in different ways to discover latent semantic content in microblogs. SMOB [264] uses hashtags as seeds to generate potential related links to web documents and ontology entries from both FOAF and DBpedia ontologies. Then, relevant semantic relations to the discovered entities are appended to the microblog. EntSem [38] enriches semantics through retrieving a ranked list of the top-k hashtags that are relevant to a user’s query and segments them into relevant individual words. Then, it retrieves a set of Wikipedia articles that are related to tweet text, hashtags, and segmented hashtags. HGTM [345] introduces a new topic model through using hashtags to determine semantic relatedness to each other through a graph structure. A graph of hashtag relatedness is constructed using probabilistic models; then, related hashtags are grouped in coherent topics.

The fourth category is lexical techniques [9, 10, 48, 81, 106, 179, 261, 273, 309, 392] that improve traditional techniques that are used for long text to be effective for short textual microblog content. InducSem [273] induces semantic entities using lexical pattern-based approach that match microblog text with seed keywords of each semantic category, e.g., food, sports, or vehicles. KnoSem [309] uses lexical resources that include corpus and POS-tagged terms to label tweets with semantic frames for knowledge extraction purposes. EveSem [261] analyzes word co-occurrences to discover relationships among word pairs. Then, such features are used to calculate the pairwise similarity of tweets for event detection purposes. HIVSem [10] uses lexical matching techniques to analysis the presence of an HIV prevention drug on Twitter. PlcSem [179] extracts place semantics through LDA topic modeling from a collection of microblogs to abstract their content through probabilistic models into a set of coherent topics. Then, the extracted place semantics is analyzed for temporal changes, e.g., a sports arena could evolve over time to be a place for concerts and exhibitions. MonSem [48] uses lexical matching to match microblog content with semantic knowledge bases to monitor unexpected events on social media. LikSem [9] uses semantic user attributes to enhance link prediction among social media users. RetSem [392] uses lexical semantic features to enhance microblogs retrieval performance. TriSem [81] uses semantic relevance to filter tweets based on Wikipedia concepts and trigrams. RecSem [106] uses semantic relatedness to recommend users to follow. It links users to Wikipedia through lexical and disambiguation algorithms; then, similar users are recommended.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Magdy, A., Abdelhafeez, L., Kang, Y. et al. Microblogs data management: a survey. The VLDB Journal 29, 177–216 (2020). https://doi.org/10.1007/s00778-019-00569-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-019-00569-6

Keywords

Navigation