Microblogs data management: a survey

Magdy, Amr; Abdelhafeez, Laila; Kang, Yunfan; Ong, Eric; Mokbel, Mohamed F.

doi:10.1007/s00778-019-00569-6

Microblogs data management: a survey

Special Issue Paper
Published: 18 September 2019

Volume 29, pages 177–216, (2020)
Cite this article

The VLDB Journal Aims and scope Submit manuscript

Amr Magdy¹,
Laila Abdelhafeez¹^na1,
Yunfan Kang¹^na1,
Eric Ong¹ &
…
Mohamed F. Mokbel²

1348 Accesses
22 Citations
Explore all metrics

Abstract

Microblogs data is the microlength user-generated data that is posted on the web, e.g., tweets, online reviews, comments on news and social media. It has gained considerable attention in recent years due to its widespread popularity, rich content, and value in several societal applications. Nowadays, microblogs applications span a wide spectrum of interests including targeted advertising, market reports, news delivery, political campaigns, rescue services, and public health. Consequently, major research efforts have been spent to manage, analyze, and visualize microblogs to support different applications. This paper gives a comprehensive review of major research and system work in microblogs data management. The paper reviews core components that enable large-scale querying and indexing for microblogs data. A dedicated part gives particular focus for discussing system-level issues and on-going effort on supporting microblogs through the rising wave of big data systems. In addition, we review the major research topics that exploit these core data management components to provide innovative and effective analysis and visualization for microblogs, such as event detection, recommendations, automatic geotagging, and user queries. Throughout the different parts, we highlight the challenges, innovations, and future opportunities in microblogs data research.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 8

Fig. 12

Fig. 13

A personalized recommendation algorithm based on large-scale real micro-blog data

Article 15 June 2020

Research on Dissemination Value of Micro-Blog Information and Empirical Study

Graph Based Visualization of Large Scale Microblog Data

Notes

References

Abdelhaq, H., Gertz, M., Armiti, A.: Efficient online extraction of keywords for localized events in Twitter. GeoInformatica 21(2), 365–388 (2017)
Google Scholar
Abdelhaq, H., Sengstock, C., Gertz, M.: EvenTweet: online localized event detection from Twitter. In: VLDB (2013)
Abdelsadek, Y., Chelghoum, K., Herrmann, F., Kacem, I.: Community extraction and visualization in social networks applied to Twitter. Inf. Sci. 424, 204–223 (2018)
Google Scholar
Abreu, J., Castro, I., Martínez, C., Oliva, S., Gutiérrez, Y.: UCSC-NLP at SemEval-2017 Task 4: sense n-grams for sentiment analysis in Twitter. In: SemEval-2017 (2017)
Agarwal, A., Xie, B., Vovsha, I., Rambow, O., Passonneau, R.: Sentiment analysis of Twitter data. In: LSM@ACL (2011)
Agarwal, M.K, Bansal, D., Garg, M., Ramamritham, K.: Keyword search on microblog data streams: finding contextual messages in real time. In: EDBT (2016)
Agarwal, M.K., Ramamritham, K., Bhide, M.: Real time discovery of dense clusters in highly dynamic graphs: identifying real world events in highly dynamic environments. PVLDB 5(10), 980–991 (2012)
Google Scholar
After Boston Explosions, People Rush to Twitter for Breaking News. http://www.latimes.com/business/technology/la-fi-tn-after-boston-explosions-people-rush-to-twitter-for-breaking-news-20130415,0,3729783.story (2013)
Ahmed, C., ElKorany, A.: Enhancing link prediction in Twitter using semantic user attributes. In: ASONAM, (2015)
Ahn, Z., McLaughlin, M., Hou, J., Nam, Y., Hu, C.W., Park, M., Meng, J.: Social network representation and dissemination of pre-exposure prophylaxis (PrEP): a semantic network analysis of HIV prevention drug on Twitter. In: Springer SCSM (2014)
Ahuja, A., Wei, W., Carley, K.M.: Microblog sentiment topic model. In: ICDM Workshops (2016)
Akbari, M., Xia, H., Nie, L., Chua, T.S: From tweets to wellness: wellness event detection from Twitter streams. In: AAAIz (2016)
Al-Olimat, H., Thirunarayan, K., Shalin, V.L., Sheth, A.P.: Location name extraction from targeted text streams using Gazetteer-based statistical language models. In: COLING (2018)
Alawad, N.A., Aris, A., Stefano, L., Ida, M., Fabrizio, S.: Network-aware recommendations of novel tweets. In: SIGIR (2016)
Alp, Z.Z., Ögüdücü, S.: Influential user detection on Twitter: analyzing effect of focus rate. In: ASONAM (2016)
Alsaedi, N., Burnap, P., Rana, O.: Can we predict a riot? Disruptive event detection using Twitter. ACM TOIT 17(2), 18 (2017)
Google Scholar
Alsaedi, N., Burnap, P., Rana, O.F.: Automatic summarization of real world events using Twitter. In: ICWSM (2016)
Alsubaiee, S., Altowim, Y., Altwaijry, H., Behm, A., Borkar, V.R., Bu, Y., Carey, M.J., Cetindil, I., Cheelangi, M., Faraaz, K., Gabrielova, E., Grover, R., Heilbron, Z., Kim, Y.S., Li, C., Ok, J.M., Onose, N., Pirzadeh, P., Tsotras, V., Vernica, R., Wen, J., Westmann, T.: AsterixDB: a scalable, open source BDMS. PVLDB 7(14), 1905–1916 (2014)
Google Scholar
Apache AsterixDB. http://asterixdb.apache.org/ (2018)
Apache Cassandra. http://cassandra.apache.org/ (2018)
Apache Flink. https://flink.apache.org/ (2018)
Apache Ignite. https://ignite.apache.org/ (2018)
Apache Impala. https://impala.apache.org/ (2018)
Apache Spark. https://spark.apache.org/ (2014)
Apache Spark Streaming. https://spark.apache.org/streaming/ (2018)
Apache Storm. https://storm.apache.org/ (2014)
Apple buys social media analytics firm Topsy Labs. www.bbc.co.uk/news/business-25195534 (2013)
A Nobel Peace Prize for Twitter? www.csmonitor.com/Commentary/Opinion/2009/0706/p09s02-coop.html (2009)
Ardon, S., Bagchi, A., Mahanti, A., Ruhela, A., Seth, A., Tripathy, R.M., Triukose, S.: Spatio-temporal and events based analysis of topic popularity in Twitter. In: CIKM (2013)
Arslan, Y., Birturk, A., Djumabaev, B., Küçük, D.: Real-time Lexicon-based sentiment analysis experiments on Twitter with a mild (more information, less data) approach. In: IEEE Big Data (2017)
Asiaee, A., Tepper, M., Banerjee, A., Sapiro, G.: If you are happy and you know it... Tweet. In: CIKM (2012)
Avudaiappan, N., Herzog, A., Kadam, S., Du, Y., Thatche, J., Safro, I.: Detecting and summarizing emergent events in microblogs and social media streams by dynamic centralities. In: IEEE Big Data (2017)
Babcock, B., Datar, M., Motwani, R.: Load shedding for aggregation queries over data streams. In: ICDE (2004)
Bai, S., Hao, B., Li, A., Yuan, S., Gao, R., Zhu, T.: Predicting big five personality traits of microblog users. In: WI (2013)
Bakliwal, A., Arora, P., Madhappan, S., Kapre, N., Singh, M., Varma, V.: Mining sentiments from tweets. In: WASSA@ACL (2012)
Balikas, G.: TwiSe at SemEval-2017 Task 4: five-point Twitter sentiment classification and quantification. In: SemEval-2017 (2017)
Balikas, G., Moura, S., Amini, M.R.: Multitask learning for fine-grained Twitter sentiment analysis. In: SIGIR (2017)
Bansal, P., Jain, S., Varma, V.: Towards semantic retrieval of hashtags in microblogs. In: WWW Companion (2015)
Barbosa, L., Feng, J.: Robust sentiment detection on Twitter from biased and noisy data. In: COLING (2010)
Bartoletti, M., Lande, S., Massa, A.: Faderank: an incremental algorithm for ranking Twitter users. In: WISE (2016)
Basu, M., Ghosh, K., Das, S., Dey, R., Bandyopadhyay, S., Ghosh, S.: Identifying post-disaster resource needs and availabilities from microblogs. In: ASONAM (2017)
Basu, M., Shandilya, A., Ghosh, K., Ghosh, S.: Automatic matching of resource needs and availabilities in microblogs for post-disaster relief. In: WWW Companion (2018)
Battle, L., Chang, R., Stonebraker, M.: Dynamic prefetching of data tiles for interactive visualization. In: SIGMOD (2016)
Baugh, W.: Bwbaugh: hierarchical sentiment analysis with partial self-training. In: SemEval, vol. 2 (2013)
Becker, L., Erhart, G., Skiba, D., Matula, V.: Avaya: sentiment analysis on twitter with self-training and polarity lexicon expansion. In: SemEval, vol. 2 (2013)
Bermingham, A., Smeaton, A.F.: Classifying sentiment in microblogs: Is brevity an advantage? In: CIKM (2010)
Bian, J., Yang, Y., Chua, T.S.: Multimedia summarization for trending topics in microblogs. In: CIKM (2013)
Bisio, F., Meda, C., Zunino, R., Surlinelli, R., Scillia, E., Ottaviano, A.: Real-time monitoring of Twitter traffic by using semantic networks. In: ASONAM (2015)
Bizid, I., Nayef, N., Boursier, N., Faïz, S., Doucet, A.: Identification of microblogs prominent users during events by learning temporal sequences of features. In: CIKM (2015)
Budak, C., Georgiou, T., Agrawal, D., Abbadi, A.E.: GeoScope: online detection of geo-correlated information trends in social networks. In: VLDB (2014)
Busch, M., Gade, K., Larson, B., Lok, P., Luckenbill, S., Lin, J.: Earlybird: real-time search at Twitter. In: ICDE (2012)
Cai, H., Huang, Z., Srivastava, D., Zhang, Q.: Indexing evolving events from tweet streams. TKDE 27(11), 3001–3015 (2015)
Google Scholar
Cao, C.C., She, J., Tong, Y., Chen, L.: Whom to ask? Jury selection for decision making tasks on micro-blog services. PVLDB 5(11), 1495–1506 (2012)
Google Scholar
Cao, X., Cong, G., Guo, T., Jensen, C.S., Ooi, B.C.: Efficient processing of spatial group keyword queries. TODS 40(2), 13 (2015)
MathSciNet Google Scholar
Cao, X., Cong, G., Jensen, C.S., Ooi, B.C.: Collective spatial keyword querying. In: SIGMOD (2011)
Cary, A., Wolfson, O., Rishe, N.: Efficient and scalable method for processing top-k spatial boolean queries. In: SSDBM (2010)
Celik, I., Abel, F., Houben, G.J.: Learning semantic relationships between entities in Twitter. In: ICWE (2011)
Chandrasekaran, S., Cooper, S., Deshpande, A., Franklin, M.J., Hellerstein, J.M., Hong, J.M., Krishnamurthy, S., Madden, S., Reiss, F., Shah, M.A.: TelegraphCQ: continuous dataflow processing. In: SIGMOD (2003)
Chavan, H., Mokbel, M.F.: Scout: a GPU-aware system for interactive spatio-temporal data visualization. In: SIGMOD (2017)
Chen, C., Li, F., Ooi, B.C., Wu, S.: TI: an efficient indexing mechanism for real-time search on tweets. In: SIGMOD (2011)
Chen, C.C., Huang, H.H., Chen, H.H.: NLG301 at SemEval-2017 Task 5: fine-grained sentiment analysis on financial microblogs and news. In: SemEval (2017)
Chen, F., Ji, R., Jinsong, S., Cao, D., Gao, Y.: Predicting microblog sentiments via weakly supervised multimodal deep learning. IEEE Trans. Multimed. 20(4), 997–1007 (2018)
Google Scholar
Chen, L., Cong, G., Cao, X.: An efficient query indexing mechanism for filtering geo-textual data. In: SIGMOD (2013)
Chen, L., Cong, G., Jensen, C.S., Wu, D.: Spatial keyword query processing: an experimental evaluation. In: VLDB (2013)
Chen, L., Cui, Y., Cong, G., Cao, X.: SOPS: a system for efficient processing of spatial-keyword publish/subscribe. PVLDB 7(13), 1601–1604 (2014)
Google Scholar
Chen, X., Li, L., Guandong, X., Yang, Z., Kitsuregawa, M.: Recommending related microblogs: a comparison between topic and WordNet based approaches. In: AAAI (2012)
Chen, X., Sykora, M.D., Jackson, T.W., Elayan, S.: What about mood swings: identifying depression on Twitter with temporal measures of emotions. In: WWW Companion (2018)
Cheng, D., Schretlen, P., Kronenfeld, N., Bozowsky, N., Wright, W.: Tile based visual analytics for Twitter big data exploratory analysis. In: IEEE Big Data (2013)
Christoforaki, M., He, J., Dimopoulos, C., Markowetz, A., Suel, T.: Text versus space: efficient geo-search query processing. In: CIKM (2011)
Clark, S., Wicentwoski, R.: SwatCS: combining simple classifiers with estimated accuracy. In: SemEval@NAACL-HLT (2013)
Cliche, M.: BB\_twtr at SemEval-2017 Task 4: Twitter sentiment analysis with CNNs and LSTMs. arXiv:1704.06125 (2017)
Cong, G., Jensen, C.S.: Querying geo-textual data: spatial keyword queries and beyond. In: SIGMOD (2016)
Cong, G., Jensen, C.S., Dingming, W.: Efficient retrieval of the top-k most relevant spatial web objects. PVLDB 2(1), 337–348 (2009)
Google Scholar
Constantin, C., Grossetti, Q., Mouza, Cé., Travers, N.: An homophily-based approach for fast post recommendation in microblogging systems. In: EDBT (2018)
Corrêa Jr. E.A., Marinho, V.Q., dos Santos, L.B.: Nilc-usp at SemEval-2017 Task 4: a multi-view ensemble for twitter sentiment analysis. arXiv:1704.02263 (2017)
Counts, S., Fisher, K.: Taking it all in?. Visual attention in microblog consumption. In: ICWSM (2011)
Cui, A., Zhang, M., Liu, Y., Ma, S.: Emotion tokens: bridging the gap among multilingual Twitter sentiment analysis. In: Asia Information Retrieval Symposium (2011)
Cui, A., Zhang, M., Liu, Y., Ma, S., Zhang, K.: Discover breaking events with popular Hashtags in Twitter. In: CIKM (2012)
da Silva, N.F.F., Hruschka, E.R., Hruschka Jr., E.R.: Tweet sentiment analysis with classifier ensembles. DSS J. 66, 170–179 (2014)
Google Scholar
da Silva, N.F.F., Coletta, L.F.S., Hruschka, E.R.: A survey and comparative study of tweet sentiment analysis via semi-supervised learning. ACM Comput. Surv. 49(1), 15:1–15:26 (2016)
Google Scholar
Dang, A., Makki, R., Moh’d, A., Islam, A., Keselj, V., Milios, E.E.: Real time filtering of tweets using Wikipedia concepts and google tri-gram semantic relatedness. In: TREC (2015)
Davidov, D., Tsur, O., Rappoport, A.: Enhanced sentiment learning using Twitter Hashtags and Smileys. In: COLING (2010)
de França Costa, D., da Silva, N.F.F.: INF-UFG at FiQA 2018 Task 1: predicting sentiments and aspects on financial tweets and news headlines. In: WWW Companion (2018)
de Macedo, A.Q., Marinho, L.B., Santos, R.L.T.: Context-aware event recommendation in event-based social networks In: RecSys (2015)
DeBrabant, J., Pavlo, A., Tu, S., Stonebraker, M., Zdonik, S.B.: Anti-caching: a new approach to database management system architecture. In: VLDB (2013)
Deshmane, A.A., Friedrichs, J.: TSA-INF at SemEval-2017 Task 4: an ensemble of deep learning architectures including lexicon features for Twitter sentiment analysis. In: SemEval-2017 (2017)
Dey, K., Shrivastava, R., Kaushik, S.: Twitter stance detection—a subjectivity and sentiment polarity inspired two-phase approach. In: ICDM Workshops (2017)
Dey, K., Shrivastava, R., Kaushik, S., Subramaniam, L.V.: EmTaggeR: a word embedding based novel method for hashtag recommendation on Twitter. In: ICDM Workshops (2017)
Ding, J., Dong, Y., Gao, T., Zhang, Z., Liu, Y.: Sentiment analysis of chinese micro-blog based on classification and rich features. In: Web Information Systems and Applications Conference (2016)
Dong, L., Wei, F., Tan, C., Tang, D., Zhou, M., Xu, K.: Adaptive recursive neural network for target-dependent Twitter sentiment classification. In: ACL (2014)
Doulamis, N.D., Doulamis, A.D., Kokkinos, P.C., Varvarigos, E.M.: Event detection in Twitter microblogging. IEEE Trans. Cybern. 46(12), 2810–2824 (2016)
Google Scholar
Dovdon, E., Saias, J.: ej-sa-2017 at SemEval-2017 Task 4: experiments for target oriented sentiment analysis in Twitter. In: SemEval@ACL (2017)
Drescher, C., Wallner, G., Kriglstein, S., Sifa, R., Drachen, A., Pohl, M.: What moves players? Visual data exploration of Twitter and Gameplay data. In: CHI (2018)
Duong-Trung, N., Schilling, N., Schmidt-Thieme, L.: Near real-time geolocation prediction in Twitter streams via matrix factorization based regression. In: CIKM (2016)
Dutt, R., Hiware, K., Ghosh, A., Bhaskaran, R.: SAVITR: a system for real-time location extraction from microblogs during emergencies. In: CoRR. arXiv:1801.07757 (2018)
Dutta, S., Chandra, V., Mehra, K., Das, A.K., Chakraborty, T., Ghosh, S.: Ensemble algorithms for microblog summarization. IEEE Intell. Syst. 33(3), 4–14 (2018)
Google Scholar
Effelsberg, W., Härder, T.: Principles of database buffer management. TODS 9(4), 560–595 (1984)
Google Scholar
Efstathiades, C., Antoniou, H., Skoutas, D., Vassiliou, Y.: TwitterViz: visualizing and exploring the Twitter sphere. In: SSTD (2015)
Ehsan, H., Sharaf, M.A., Chrysanthis, P.K.: MuVE: efficient multi-objective view recommendation for visual data exploration. In: ICDE (2016)
Eldawy, A., Mokbel, M.F., Jonathan, C.: HadoopViz: a MapReduce framework for extensible visualization of big spatial data. In: ICDE (2016)
Embrace of Social Media Aids Flood Victims in Kashmir. https://www.nytimes.com/2014/09/13/world/asia/embrace-of-social-media-aids-flood-victims-in-kashmir.html (2014)
Enoki, M., Ikawa, Y., Raymond, R.: User community reconstruction using sampled microblogging data. In: WWW Companion (2012)
Erdoğan, A.E., Yilmaz, T., Sert, O.C., Akyüz, M., Özyer, T., Alhajj, R.: From social media analysis to ubiquitous event monitoring: the case of Turkish tweets. In: ASONAM (2017)
Facebook Statistics. http://newsroom.fb.com/company-info/ (2018)
Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. In: PODS (2001)
Faralli, S., Tommaso, G. Di Velardi, P.: Semantic enabled recommender system for micro-blog users. In: ICDM (2016)
Feng, S., Song, K., Wang, D., Ge, Y.: A word-emoticon mutual reinforcement ranking model for building sentiment lexicon from massive collection of microblogs. WWW J. 18(4), 949–967 (2015)
Google Scholar
Feng, W., Zhang, C., Zhang, W., Han, J., Wang, J., Aggarwal, C., Huang, J.: STREAMCUBE: hierarchical spatio-temporal hashtag clustering for event exploration over the Twitter stream. In: ICDE (2015)
Forsati, R., Mahdavi, M., Shamsfard, M., Sarwat, M.: Matrix factorization with explicit trust and distrust side information for improved social recommendation. ACM Trans. Inf. Syst. 32(4), 17:1–17:38 (2014)
Google Scholar
Ganesh, J., Gupta, M., Varma, V.: Interpretation of semantic tweet representations. In: ASONAM (2017)
Gao, L., Wang, Y., Li, D., Shao, J., Song, J.: Real-time social media retrieval with spatial, temporal and social constraints. Neurocomputing 253, 77–88 (2017)
Google Scholar
Gedik, B., Wu, K.L., Yu, P.S., Liu, L.: A load shedding framework and optimizations for M-way windowed stream joins. In: ICDE (2007)
Genc, Y., Sakamoto, Y., Nickerson, J.V.: Discovering context: classifying tweets through a semantic transform based on Wikipedia. In: Springer FAC (2011)
Ghanem, T., Magdy, A., Musleh, M., Ghani, S., Mokbel, M.: VisCAT: spatio-temporal visualization and aggregation of categorical attributes in Twitter data. In: SIGSPATIAL (2014)
Ghiassi, M., Skinner, J., Zimbra, D.: Twitter brand sentiment analysis: a hybrid system using n-gram analysis and dynamic artificial neural network. Expert Syst. Appl. 40(16), 6266–6282 (2013)
Google Scholar
Ghosh, S., Sharma, N.K., Benevenuto, F., Ganguly, N., Gummadi, P.K.: Cognos: Crowdsourcing search for topic experts in microblogs. In: SIGIR (2012)
Giachanou, A., Crestani, F.: Like it or not: a survey of Twitter sentiment analysis methods. ACM Comput. Surv. 49(2), 28:1–28:41 (2016)
Google Scholar
Gilani, Z., Kochmar, E., Crowcroft, J.: Classification of Twitter accounts into automated agents and human users. In: ASONAM (2017)
Gillani, M., Ilyas, M.U., Saleh, S., Alowibdi, J.S., Aljohani, N.R., Alotaibi, F.S.: Post summarization of microblogs of sporting events. In: WWW Companion (2017)
Go, A., Bhayani, R., Huang, L.: Twitter sentiment classification using distant supervision. Technical report, Stanford University (2009)
Grover, R., Carey, M.: Data ingestion in AsterixDB. In: EDBT (2015)
Gu, Y., Song, J., Liu, W., Zou, L., Yao, Y.: Context aware matrix factorization for event recommendation in event-based social networks. In: WI (2016)
Guha, S., Chakraborty, T., Datta, S., Kumar, M., Varma, V.: TweetGrep: weakly supervised joint retrieval and sentiment analysis of topical tweets. In: ICWSM (2016)
Guilherme, C.R., de Lemos, V.S., Lammel, F., Manssour, I.H., Silveira, M.S., Pase, A.F.: Visualization techniques for the analysis of Twitter users’ behavior. In: ICWSM (2013)
Guo, L., Zhang, D., Li, G., Tan, K.L., Bao, Z.: Location-aware pub/sub system: when continuous moving queries meet dynamic event streams. In: SIGMOD (2015)
Guo, L., Zhang, D., Wang, Y., Huayu, W., Cui, B., Tan, K.-L.: Co2: Inferring personal interests from raw footprints by connecting the offline world with the online world. ACM Trans. Inf. Syst. (TOIS) 36(3), 31 (2018)
Google Scholar
Guo, T., Cao, X., Cong, G.: Efficient algorithms for answering the M-closest keywords query. In: SIGMOD (2015)
Guo, T., Feng, K., Cong, G., Bao, Z.: Efficient selection of geospatial data on maps for interactive and visualized exploration. In: SIGMOD (2018)
Gupta, P., Goel, A., Lin, J.J., Sharma, A., Wang, D., Zadeh, R.: WTF: the who to follow service at Twitter. In: WWW (2013)
Gupta, P., Satuluri, V., Grewal, A., Gurumurthy, S., Zhabiuk, V., Li, Q., Lin, J.J.: Real-time Twitter recommendation: online Motif detection in large dynamic graphs. PVLDB 7(13), 1379–1380 (2014)
Google Scholar
Hamdan, H., Béchet, F., Bellot, P.: Experiments with DBpedia, WordNet and SentiWordNet as resources for sentiment analysis in micro-blogging. In: SemEval@NAACL-HLT (2013)
Hannon, J., Bennett, M., Smyth, B.: Recommending Twitter users to follow using content and collaborative filtering approaches. In: RecSys (2010)
Hansu, G., Gartrell, M., Zhang, L., Lv, Q., Grunwald, D.: AnchorMF: towards effective event context identification. In: CIKM (2013)
Hao, Y., Lan, Y., Li, Y., Li, C.: XJSA at SemEval-2017 Task 4: a deep system for sentiment classification in Twitter. In: SemEval-2017 (2017)
Harvard Medical School Researchers Awarded Twitter Data Grant. https://hms.harvard.edu/news/harvard-medical-school-researchers-awarded-twitter-data-grant (2014)
Hassan, A., Abbasi, A., Zeng, D.: Twitter sentiment analysis: a bootstrap ensemble framework. In: SocialCom (2013)
He, L., Luo, J.: What makes a pro eating disorder Hashtag: using Hashtags to identify pro eating disorder Tumblr posts and Twitter users. In: IEEE Big Data (2016)
He, Y., Barman, S., Naughton, J.F.: On load shedding in complex event processing. In: ICDT (2014)
He, Y., Lin, C., Gao, W., Wong, K.F.: Tracking sentiment and topic dynamics from social media. In: ICWSM (2012)
Health Department Use of Social Media to Identify Foodborne Illness—Chicago, Illinois, 2013–2014. https://www.cdc.gov/mmwr/preview/mmwrhtml/mm6332a1.htm (2014)
Hecht, B.J., Hong, L., Suh, B., Chi, E.H.: Tweets from Justin Bieber’s heart: the dynamics of the location field in user profiles. In: CHI (2011)
Hoang, T., Cher, P.H., Prasetyo, P.K., Lim, E.P.: Big data: crowdsensing and analyzing micro-event tweets for public transportation insights. In: IEEE (2016)
Hong, L., Ahmed, A., Gurumurthy, S., Smola, A.J., Tsioutsiouliklis, K.: Discovering geographical topics in the Twitter stream. In: WWW (2012)
How Facebook Is Transforming Disaster Response. https://www.wired.com/2016/11/facebook-disaster-response/ (2016)
How Twitter, Facebook, WhatsApp And Other Social Networks Are Saving Lives During Disasters. http://www.huffingtonpost.in/2017/01/31/how-twitter-facebook-whatsapp-and-other-social-networks-are-sa_a_21703026/ (2017)
Htait, A., Fournier, S., Bellot, P.: LSIS at SemEval-2017 Task 4: using adapted sentiment similarity seed words for English and Arabic tweet polarity classification. In: SemEval (2017)
Hu, G., Bhargava, P., Fuhrmann, S., Ellinger, S., Spasojevic, N.: Analyzing users’ sentiment towards popular consumer industries and brands on Twitter. arXiv:1709.07434 (2017)
Hu, Q., Pei, Y., Chen, Q., He, L.: SG++: Word representation with sentiment and negation for Twitter sentiment classification. In: SIGIR (2016)
Hu, X., Tang, L., Liu, H.: Enhancing accessibility of microblogging messages using semantic knowledge. In: CIKM (2011)
Hu, X., Tang, L., Tang, J., Liu, H.: Exploiting social relations for sentiment analysis in microblogging. In: WSDM (2013)
Hu, Y., John, A., Wang, F., Kambhampati, S.: ET-LDA: joint topic modeling for aligning events and their Twitter feedback. In: AAAI, vol. 12 (2012)
Hu, Y., Nian, T., Chen, C.: Mood congruence or mood consistency? examining aggregated Twitter sentiment towards Ads in 2016 super bowl. In: ICWSM (2017)
Hua, T., Chen, F., Zhao, L., Chang-Tien, L., Ramakrishnan, N.: STED: semi-supervised targeted-interest event detectionin in Twitter. In: SIGKDD (2013)
Hua, T., Chen, F., Zhao, L., Lu, C.-T., Ramakrishnan, N.: Automatic targeted-domain spatio-temporal event detection in Twitter. GeoInformatica 20(4), 765–795 (2016)
Google Scholar
Hubert, R.B., Estevez, E., Maguitman, A.G., Janowski, T.: Examining government-citizen interactions on Twitter using visual and sentiment analysis. In: DG.O (2018)
Hurricane Harvey Victims Turn to Twitter and Facebook. http://time.com/4921961/hurricane-harvey-twitter-facebook-social-media/ (2017)
In Irma, Emergency Responders’ New Tools: Twitter and Facebook. https://www.wsj.com/articles/for-hurricane-irma-information-officials-post-on-social-media-1505149661 (2017)
Ikawa, Y., Enoki, M., Tatsubori, M.: Location inference using microblog messages. In: WWW (2012)
Itoh, M., Yokoyama, D., Toyoda, M., Tomita, Y., Kawamura, S., Kitsuregawa, M.: Visual exploration of changes in passenger flows and tweets on mega-city metro network. IEEE Trans. Big Data 2(1), 85–99 (2016)
Google Scholar
Jabreel, M., Moreno, A.: SiTAKA at SemEval-2017 Task 4: sentiment analysis in twitter based on a rich set of features. In: SemEval (2017)
Japan earthquake: how Twitter and Facebook helped. http://www.telegraph.co.uk/technology/twitter/8379101/Japan-earthquake-how-Twitter-and-Facebook-helped.html (2011)
Jia, J., Li, C., Zhang, X., Li, C., Carey, M.J., Su, S.: Towards interactive analytics and visualization on one billion tweets. In: SIGSPATIAL (2016)
Jiang, J., Lu, H., Yang, B., Cui, B.: Finding top-k local users in geo-tagged social media data. In: ICDE (2015)
Jiang, L., Yu, M., Zhou, M., Liu, X., Zhao, T.: Target-dependent Twitter sentiment classification. In: ACL (2011)
Jianqiang, Z., Xiaolin, G., Xuejun, Z.: Deep convolution neural networks for Twitter sentiment analysis. IEEE Access 6, 23253–23260 (2018)
Google Scholar
Jo, Y., Oh, A.H: Aspect and sentiment unification model for online review analysis. In: WSDM (2011)
Jonathan, C., Magdy, A., Mokbel, M.F., Jonathan, A.: GARNET: a holistic system approach for trending queries in microblogs. In: ICDE (2016)
Jones, A.J., Carlson, E.: TwitterViz: a robotics system for remote data visualization. In: ICWSM (2013)
Kallman, R., Kimura, H., Natkins, J., Pavlo, A., Rasin, A., Zdonik, S.B., Jones, E.P.C., Madden, S., Stonebraker, M., Zhang, Y., Hugg, J., Abadi, D.J.: H-store: a high-performance, distributed main memory transaction processing system. PVLDB 1(2), 1496–1499 (2008)
Google Scholar
Kalyanam, J., Velupillai, S., Conway, M., Lanckriet, G.: From event detection to storytelling on microblogs. In: ASONAM (2016)
Kaneko, T., Yanai, K.: Visual event mining from the Twitter stream. In: WWW Companion (2016)
Karanasou, M., Ampla, A., Doulkeridis, C., Halkidi, M.: Scalable and real-time sentiment analysis of Twitter data. In: ICDM Workshops (2016)
Kazai, G., Iskander, Y., Daoud, C.: Personalised news and blog recommendations based on user location, Facebook and Twitter user profiling. In: SIGIR (2016)
Kempter, R., Sintsova, V., Musat, C.C., Pu, P.: EmotionWatch: visualizing fine-grained emotions in event-related tweets. In: ICWSM (2014)
Khan, F.H., Bashir, S., Qamar, U.: TOM: Twitter opinion mining framework using hybrid classification scheme. DSS J. 57, 245–257 (2014)
Google Scholar
Khatua, A., Khatua, A.: Cricket World Cup 2015: predicting user’s orientation through mix tweets on twitter platform. In: ASONAM (2017)
Khuc, V.N., Shivade, C., Ramnath, R., Ramanathan, J.: SAC: towards building large-scale distributed systems for Twitter sentiment analysis. In: ACM (2012)
Kim, A., Blais, E., Parameswaran, A.G., Indyk, P., Madden, S., Rubinfeld, R.: Rapid sampling for visualizations with ordering guarantees. PVLDB 8(5), 521–532 (2015)
Google Scholar
Kim, E., Ihm, H., Myaeng, S.H.: Topic-based place semantics discovered from microblogging text messages. In: WWW Companion (2014)
Kiritchenko, S., Zhu, X., Mohammad, S.M.: Sentiment analysis of short informal texts. JAIR 50, 723–762 (2014)
Google Scholar
Kitazawa, T., Yui, M.: Query-based simple and scalable recommender systems with Apache Hivemall. In: RecSys (2018)
Kolovou, A., Kokkinos, F., Fergadis, A., Papalampidi, P., Iosif, E., Malandrakis, N., Palogiannidi, E., Papageorgiou, H., Narayanan, S., Potamianos, A.: Tweester at SemEval-2017 Task 4: fusion of semantic-affective and pairwise classification models for sentiment analysis in Twitter. In: SemEval@ACL (2017)
Kontopoulos, E., Berberidis, C., Dergiades, T., Bassiliades, N.: Ontology-based sentiment analysis of Twitter posts. Expert Syst. Appl. 40(10), 4065–4074 (2013)
Google Scholar
Korenek, P., Simko, M.: Sentiment analysis on microblog utilizing appraisal theory. WWW J. 17(4), 847–867 (2014)
Google Scholar
Kouloumpis, E., Wilson, T., Moore, J.D.: Twitter sentiment analysis: the good the bad and the OMG! In: ICWSM (2011)
Kowald, D., Pujari, S.C., Lex, E.: Temporal effects on hashtag reuse in twitter: a cognitive-inspired hashtag recommendation approach. In: WWW (2017)
Krumm, J., Horvitz, E.: Eyewitness: identifying local events via space-time signals in Twitter feeds. In: SIGSPATIAL (2015)
Kumamoto, T., Suzuki, T., Wada, H.: Visualizing impression-based preferences of Twitter users. In: SCSM-HCI (2014)
Kumar, A., Sebastian, T.M.: Sentiment analysis on Twitter. IJCSI 9(4), 372 (2012)
Google Scholar
Kuramochi, T., Okada, N., Tanikawa, K., Hijikata, Y., Nishida, S.: Applying to Twitter networks of a community extraction method using intersection graph and semantic analysis. In: Springer HCI (2013)
Lacic, E.: Real-time recommendations in a multi-domain environment. In: ACM HT (2016)
Lacic, E., Kowald, D., Parra, D., Kahr, M., Trattner, C.: Towards a scalable social recommender engine for online marketplaces: the case of apache solr. In: WWW Companion (2014)
Lahoti, P., De Francisci Morales, G., Gionis, A.: Finding topical experts in twitter via query-dependent personalized PageRank. In: ASONAM (2017)
Laskari, N.K., Sanampudi, S.K.: TWINA at SemEval-2017 Task 4: Twitter sentiment analysis with ensemble gradient boost tree classifier. In: SemEval-2017 (2017)
Lee, G., Lin, J., Liu, C., Lorek, A., Ryaboy, D.V.: The unified logging infrastructure for data analytics at Twitter. PVLDB 5(12), 1771–1780 (2012)
Google Scholar
Lee, T., Park, J.W., Lee, S., Hwang, S.W., Elnikety, S., He, Y.: Processing and optimizing main memory spatial-keyword queries. PVLDB 9(3), 132–143 (2015)
Google Scholar
Levandoski, J., Larson, P., Stoica, R.: Identifying hot and cold data in main-memory databases. In: ICDE (2013)
Levandoski, J.J., Sarwat, M., Mokbel, M.F., Ekstrand, M.D.: RecStore: an extensible and adaptive framework for online recommender queries inside the database engine. In: EDBT (2012)
Li, G., Hu, J., Feng, J., Tan, K.L.: Effective location identification from microblogs. In: ICDE (2014)
Li, G., Wang, Y., Wang, T., Feng, J.: Location-aware publish/subscribe. In: KDD (2013)
Li, J., Liao, M., Gao, W., He, Y., Wong, K.F.: Topic extraction from microblog posts using conversation structures. In: ACL (2016)
Li, Q., Shah, S., Nourbakhsh, A., Fang, R., Liu, X.: funSentiment at SemEval-2017 Task 5: fine-grained sentiment analysis on financial microblogs using word vectors built from StockTwits and Twitter. In: SemEval (2017)
Li, R., Lei, K.H., Khadiwala, R., Chang, K.C.C.: TEDAS: a Twitter-based event detection and analysis system. In: ICDE (2012)
Li, Y., Jiang, J., Liu, T., Qiu, M., Sun, X.: Personalized microtopic recommendation on microblogs. ACM TIST 8(6), 77 (2017)
Google Scholar
Li, Y., Bao, Z., Li, G., Tan, K.L.: Real time personalized search on social networks. In: ICDE (2015)
Li, Z., Lee, K.C.K., Zheng, B., Lee, W.-C., Lee, D.L., Wang, X.: IR-Tree: an efficient index for geographic document search. TKDE 23(4), 585–599 (2011)
Google Scholar
Lim, K.H., Lee, K.E., Kendal, D., Rashidi, L., Naghizade, E., Winter, S., Vasardani, M.: The grass is greener on the other side: understanding the effects of green spaces on Twitter user sentiments. In: WWW Companion (2018)
Lin, J., Kolcz, A.: Large-scale machine learning at Twitter. In: SIGMOD (2012)
Lin, J., Mishne, G.: A study of “Churn” in tweets and real-time search queries. In: ICWSM (2012)
Lingad, J., Karimi, S., Yin, J.: Location extraction from disaster-related microblogs. In: WWW (2013)
Lingkun, W., Lin, W., Xiao, X., Xu, Y.: LSII: An indexing structure for exact real-time search on microblogs. In: ICDE (2013)
Liu, M., Fu, K., Lu, C.T., Chen, G., Wang, H.: A search and summary application for traffic events detection based on Twitter data. In: SIGSPATIAL (2014)
Liu, N., Li, L., Guandong, X., Yang, Z.: Identifying domain-dependent influential microblog users: a post-feature based approach. In: AAAI (2014)
Liu, S., Li, F., Li, F., Cheng, X., Shen, H.: Adaptive co-training SVM for sentiment classification on tweets. In: CIKM (2013)
Liu, S., Zhu, W., Xu, N., Li, F., Cheng, X.Q., Liu, Y., Wang, Y.: Co-training and visualizing sentiment evolvement for tweet events. In: WWW (2013)
Liu, X., Fu, Z., Wei, F., Zhou, M.: Collective nominal semantic role labeling for tweets. In: AAAI (2012)
Liu, X., Li, K., Zhou, M., Xiong, Z.: Enhancing semantic role labeling for tweets using self-training. In: AAAI (2011)
Liu, X., Li, Q., Nourbakhsh, A., Fang, R., Thomas, M., Anderson, K., Kociuba, R., Vedder, M., Pomerville, S., Wudali, R., et al.: Reuters tracer: a large scale system of detecting & verifying real-time news events from Twitter. In: CIKM (2016)
Long, C., Wong, R.C.W., Wang, K., Fu, A.W.C.: Collective spatial keyword queries: a distance owner-driven approach. In: SIGMOD (2013)
Lozić, D., Šarić, D., Tokić, I., Medić, Z., Šnajder, J.: TakeLab at SemEval-2017 Task 4: recent deaths and the power of nostalgia in sentiment analysis in Twitter. In: SemEval-2017 (2017)
Lu, X., Li, P., Ma, H., Wang, S., Xu, A., Wang, B.: Computing and applying topic-level user interactions in microblog recommendation. In: SIGIR (2014)
Ma, R., Zhang, Q., Wang, J., Cui, L., Huang, X.: Mention recommendation for multimodal microblog with cross-attention memory network. In: SIGIR (2018)
Magdy, A., Alarabi, L., Al-Harthi, S., Musleh, M., Ghanem, T., Ghani, S., Mokbel, M.: Taghreed: a system for querying, analyzing, and visualizing geotagged microblogs. In: SIGSPATIAL (2014)
Magdy, A., Alghamdi, R., Mokbel, M.F.: On main-memory flushing in microblogs data management systems. In: ICDE (2016)
Magdy, A., Aly, A.M., Mokbel, M.F., Elnikety, S., He, Y., Nath, S., Aref, W.G.: GeoTrend: spatial trending queries on real-time microblogs. In: SIGSPATIAL (2016)
Magdy, A., Mokbel, M.: Towards a microblogs data management system. In: MDM (2015)
Magdy, A., Mokbel, M.: Microblogs data management and analysis (tutorial). In: ICDE (2016)
Magdy, A., Mokbel, M.: Demonstration of kite: a scalable system for microblogs data management. In: ICDE (2017)
Magdy, A., Mokbel, M.F., Elnikety, S., Nath, S., He, Y.: Mercury: a memory-constrained spatio-temporal real-time search on microblogs. In: ICDE (2014)
Magdy, A., Mokbel, M.F., Elnikety, S., Nath, S., He, Y.: Venus: scalable real-time spatial queries on microblogs with adaptive load shedding. TKDE 28(2), 356–370 (2016)
Google Scholar
Magdy, A., Musleh, M., Tarek, K., Alarabi, L., Al-Harthi, S., Elmongui, H.G., Ghanem, T.M., Ghani, S., Mokbel, M.F.: Taqreer: a system for spatio-temporal analysis on microblogs. IEEE Data Eng. Bull. 38(2), 68–76 (2015)
Google Scholar
Magnuson, A., Dialani, V., Mallela, D.: Event recommendation using Twitter activity. In: RecSys (2015)
Mahmood, A.R., Aref, W.G., Aly, A.M.: FAST: frequency-aware indexing for spatio-textual data streams. In: ICDE (2018)
Mahmood, A.R., Aref, W.G., Aly, A.M., Tang, M.: Atlas: on the expression of spatial-keyword group queries using extended relational constructs. In: SIGSPATIAL (2016)
Mahmud, J., Nichols, J., Drews, C.: Where is this tweet from? Inferring home locations of Twitter users. In: ICWSM(2012)
Makki, R., de Carvalho, E.J., Soto, A.J., Brooks, S., de Oliveira, M.C.F., Milios, E.E., Minghim, R.: ATR-Vis: visual and interactive information retrieval for parliamentary discussions in Twitter. TKDD 12(1), 31–333 (2018)
Google Scholar
Marcus, A., Bernstein, M.S., Badar, O., Karger, D.R., Madden, S., Miller, R.C.: Tweets as data: demonstration of TweeQL and TwitInfo. In: SIGMOD (2011)
Marcus, A., Bernstein, M.S., Badar, O., Karger, D.R., Madden, S., Miller, R.C.: Twitinfo: aggregating and visualizing microblogs for event exploration. In: CHI (2011)
McCullough, D., Lin, J., Macdonald, C., Ounis, I., McCreadie, R.M.C.: Evaluating real-time search over tweets. In: ICWSM (2012)
McMinn, A.J., Tsvetkov, D., Yordanov, T., Patterson, A., Szk, R., Rodriguez Perez, J.A., Jose, J.M.: An interactive interface for visualizing events on Twitter. In: SIGIR (2014)
Mei, Q., Xu, L., Wondra, M., Su, H., Zhai, C.: Topic sentiment mixture: modeling facets and opinions in weblogs. In: WWW (2007)
Meij, E., Weerkamp, W., de Rijke, M.: Adding semantics to microblog posts. In: WSDM (2012)
Metwally, A., Agrawal, D., Abbadi, A.E.: Efficient computation of frequent and top-k elements in data streams. In: ICDT (2005)
Miranda-Jiménez, S., Graff, M., Tellez, E.S., Moctezuma, D.: INGEOTEC at SemEval 2017 Task 4: A B4MSA ensemble based on genetic programming for Twitter sentiment analysis. In: SemEval (2017)
Mishne, G., Dalton, J., Li, Z., Sharma, A., Lin, J.: Fast data in the era of big data: Twitter’s real-time related query suggestion architecture. In: SIGMOD (2013)
Mishne, G., Lin, J.: Twanchor text: a preliminary study of the value of tweets as anchor text. In: SIGIR (2012)
Mohammad, S.: #Emotional tweets. In: *SEM@NAACL-HLT (2012)
Mohammad, S., Kiritchenko, S., Zhu, X.: NRC-Canada: building the state-of-the-art in sentiment analysis of tweets. In: SemEval@NAACL-HLT (2013)
Mokbel, M., Magdy, A.: Microblogs data management systems: querying, analysis, and visualization (tutorial). In: SIGMOD (2016)
Mokbel, M.F., Aref, W.G.: SOLE: scalable on-line execution of continuous queries on spatio-temporal data streams. VLDB J. 17(5), 971–995 (2008)
Google Scholar
Mokbel, M.F.H., Ahmed, A.M.M.: System and method for microblogs data management, provisionally filed in U.S. Patent and Trademark Office on August 31, 2015, Application number: 14/841299. http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PG01&p=1&u=/netahtml/PTO/srchnum.html&r=1&f=G&l=50&s1=20160070754.PGNR
MongoDB. https://www.mongodb.com/ (2018)
Mu, L., Jin, P., Zheng, L., Chen, E.H., Yue, L.: Lifecycle-based event detection from microblogs. In: WWW Companion (2018)
Mulki, H., Haddad, H., Gridach, M., Babaoğlu, I.: Tw-StAR at SemEval-2017 Task 4: sentiment classification of Arabic tweets. In: SemEval-2017 (2017)
Nasim, Z.: IBA-Sys at SemEval-2017 Task 5: fine-grained sentiment analysis on financial microblogs and news. In: SemEval (2017)
New Enhanced Geo-targeting for Marketers. https://blog.twitter.com/2012/new-enhanced-geo-targeting-for-marketers (2012)
New Study Quantifies Use of Social Media in Arab Spring. www.washington.edu/news/2011/09/12/new-study-quantifies-use-of-social-media-in-arab-spring/ (2011)
Nodarakis, N., Sioutas, S., Athanasios K.T., Giannis, T.: Large scale sentiment analysis on Twitter with spark. In: EDBT Workshops (2016)
One Million Tweet Map. http://onemilliontweetmap.com/ (2016)
Ortega, R., Fonseca, A., Montoyo, A.: SSA-UO: unsupervised Twitter sentiment analysis. In: Joint Conference on Lexical and Computational Semantics (* SEM), vol. 2 (2013)
Ozdikis, O., Senkul, P., Oguztüzün, H.: Semantic expansion of tweet contents for enhanced event detection in Twitter. In: ASONAM (2012)
Pak, A., Paroubek, P.: Twitter as a corpus for sentiment analysis and opinion mining. In: LREC (2010)
Park, Y., Cafarella, M.J., Mozafari, B.: Visualization-aware sampling for very large databases. In: ICDE (2016)
Passant, A., Bojars, U., Breslin, J.G., Hastrup, T., Stankovic, M., Laublet, P.: An overview of SMOB 2: open, semantic and distributed microblogging. In: ICWSM (2010)
Paul, D., Li, F., Teja, M.K., Yu, X., Frost, R.: Compass: spatio temporal sentiment analysis of US election what Twitter says! In: SIGKDD (2017)
Penagos, C.R., Batalla, J.A., Codina-Filbà, J., Narbona, D.G., Grivolla, J., Lambert, P., Saurí, R.: FBM: combining lexicon-based ML and heuristics for social media polarities. In: SemEval@NAACL-HLT (2013)
Peng, M., Zhu, J., Wang, H., Li, X., Zhang, Y., Zhang, X., Tian, G.: Mining event-oriented topics in microblog stream with unsupervised multi-view hierarchical embedding. TKDD 12(3), 38 (2018)
Google Scholar
Phelan, O., McCarthy, K., Smyth, B.: Using Twitter to recommend real-time topical news. In: RecSys (2009)
Popescu, A.M., Pennacchiotti, M.: Detecting controversial events from Twitter. In: CIKM (2010)
Prediction, Optimization and Control for Information Propagation on Networks: A Differential Equation and Mass Transportation Based Approach. https://www.nsf.gov/awardsearch/showAward?AWD_ID=1620342 (2017)
Presto. http://prestodb.io/ (2018)
Public Health Emergency, Department of Health and Human Services. http://nowtrending.hhs.gov/ (2015)
Qadir, A., Mendes, P.N., Gruhl, D., Lewis, N.: Semantic lexicon induction from Twitter with pattern relatedness and flexible term length. In: AAAI (2015)
Qian, Y., Tang, J., Yang, Z., Huang, B., Wei, W., Carley, K.M.: A probabilistic framework for location inference from social media. In: CoRR. arXiv:1702.07281 (2017)
Qiu, L., Lei, Q., Zhang, Z.: Advanced sentiment classification of Tibetan microblogs on smart campuses based on multi-feature fusion. IEEE Access 6, 17896–17904 (2018)
Google Scholar
Rajendram, S.M., Mirnalinee, T.T., et al.: SSN\_MLRG1 at SemEval-2017 Task 4: sentiment analysis in Twitter using multi-kernel gaussian process classifier. In: SemEval (2017)
Ramage, D., Dumais, S.T., Liebling, D.J.: Characterizing microblogs with topic models. In: ICWSM (2010)
Ranganathan, J., Irudayaraj, A.S., Tzacheva, A.A.: Action rules for sentiment analysis on Twitter data using spark. In: ICDM Workshops (2017)
Redis. https://redis.io/ (2018)
Ren, Y., Zhang, Y., Zhang, M., Ji, D.: Context-sensitive Twitter sentiment classification using neural network. In: AAAI (2016)
Ren, Y., Zhang, Y., Zhang, M., Ji, D.: Improving Twitter sentiment classification using topic-enriched multi-prototype word embeddings. In: AAAI (2016)
Ribeiro, M.H., Calais, P.H., Santos, Y.A., Almeida, V.A.F., Meira, W. Jr.: Characterizing and detecting hateful users on Twitter. In: CoRR. arXiv:1803.08977 (2018)
Rios, M., Lin, J.J.: Visualizing the “Pulse” of world cities on Twitter. In: ICWSM Citeseer (2013)
Rios, R.A., Pagliosa, P.A., Ishii, R.P., de Mello, R.F.: TSViz: a data stream architecture to online collect, analyze, and visualize tweets. In: SAC (2017)
Ritter, A., Etzioni, O., Clark, S., et al.: Open domain event extraction from Twitter. In: SIGKDD (2012)
RocksDB. https://rocksdb.org/ (2018)
Romero, S., Becker, K.: A framework for event classification in tweets based on hybrid semantic enrichment. Expert Syst. Appl. 118, 522–538 (2019)
Google Scholar
Rozental, A., Fleischer, D.: Amobee at SemEval-2017 Task 4: deep learning system for sentiment detection on Twitter. arXiv:1705.01306 (2017)
Rudra, K., Ghosh, S., Ganguly, N., Goyal, P., Ghosh, S.: Extracting situational information from microblogs during disaster events: a classification-summarization approach. In: CIKM (2015)
Rudra, K., Goyal, P., Ganguly, N., Mitra, P., Imran, M.: Identifying sub-events and summarizing disaster-related information from microblogs. In: SIGIR (2018)
Ryoo, K., Moon, S.: Inferring Twitter user locations with 10 km accuracy. In: WWW Companion (2014)
Sakaki, T., Okazaki, M., Matsuo, Y.: Earthquake shakes Twitter users: real-time event detection by social sensors. In: WWW (2010)
Sang, J., Lu, D., Xu, C.: A probabilistic framework for temporal user modeling on microblogs. In: CIKM (2015)
Sankaranarayanan, J., Samet, H., Teitler, B.E., Lieberman, M.D., Sperling, J.: TwitterStand: news in tweets. In: SIGSPATIAL (2009)
Sarwat, M.: Recdb: towards DBMS support for online recommender systems. In: Proceedings of the ACM SIGMOD/PODS PhD Symposium 2012, Scottsdale, AZ, USA, May 20, 2012, pp. 33–38 (2012)
Sarwat, M., Avery, J.L., Mokbel, M.F.: A RecDB in action: recommendation made easy in relational databases. PVLDB 6(12), 1242–1245 (2013)
Google Scholar
Sarwat, M., Avery, J.L., Mokbel, M.F.: RECATHON: a middleware for context-aware recommendation in database systems. In: MDM (2015)
Sarwat, M., Moraffah, R., Mokbel, M.F., Avery, J.L.: Database system support for personalized recommendation applications. In: ICDE (2017)
Satapathy, R., Guerreiro, C., Chaturvedi, I., Cambria, E.: Phonetic-based microtext normalization for Twitter sentiment analysis. In: ICDM Workshops (2017)
Sharma, A., Jerry, J., Praveen, B., Brian, L., Jimmy, L.: GraphJet: real-time content recommendations at Twitter. In: VLDB, pp. 1281–1292 (2016)
Google Scholar
Si, J., Mukherjee, A., Liu, B., Li, Q., Li, H., Deng, X.: Exploiting topic based Twitter sentiment for stock prediction. In: ACL, vol. 2 (2013)
Sijtsma, B., Qvarfordt, P., Chen, F.: Tweetviz: visualizing tweets for business intelligence. In: SIGIR (2016)
Singh, V.K., Gao, J.R.: Situation detection and control using spatio-temporal analysis of microblogs. In: WWW (2010)
Sina Weibo, China Twitter, comes to rescue amid flooding in Beijing. http://thenextweb.com/asia/2012/07/23/sina-weibo-chinas-twitter-comes-to-rescue-amid-flooding-in-beijing/ (2012)
Skovsgaard, A., Sidlauskas, D., Jensen, C.S.: Scalable top-k spatio-temporal term querying. In: ICDE (2014)
Smith, K.S., McCreadie, R., Macdonald, C., Ounis, I.: Analyzing disproportionate reaction via comparative multilingual targeted sentiment in Twitter. In: ASONAM (2017)
Soto, A.J., Brooks, S., Raheleh, M., Milios, E.E.: Twitter message recommendation based on user interest profiles. In: ASONAM (2016)
Sparsity Models for Forecasting Spatio-Temporal Human Dynamics. https://www.nsf.gov/awardsearch/showAward?AWD_ID=1737770 (2017)
Søgaard, A., Plank, B., Alonso, H.M.: Using frame semantics for knowledge extraction from Twitter. In: AAAI (2015)
Song, K., Chen, L., Gao, W., Feng, S., Wang, D., Zhang, C.: Persentiment: a personalized sentiment classification system for microblog users. In: WWW Companion (2016)
Sotiropoulos, D.N., Kounavis, C.D., Giaglis, G.M.: Semantically meaningful group detection within sub-communities of Twitter blogosphere: a topic oriented multi-objective clustering approach. In: ASONAM (2013)
Soulier, L., Lynda, T., Gia-Hung, N.: Answering Twitter questions: a model for recommending answerers through social collaboration. In: CIKM (2016)
Speriosu, M., Sudan, N., Upadhyay, S., Baldridge, J.: Twitter polarity classification with label propagation over lexical links and the follower graph. In: Workshop on Unsupervised Learning in NLP (2011)
Steiger, E., Resch, B., Zipf, A.: Exploration of spatiotemporal and semantic clusters of Twitter data using unsupervised neural networks. IJGIS 30(9), 1694–1716 (2016)
Google Scholar
Stonebraker, M., Weisberg, A.: The VoltDB main memory DBMS. IEEE Data Eng. Bull. 36(2), 21–27 (2013)
Google Scholar
Sundararaman, D., Srinivasan, S.: Twigraph: discovering and visualizing influential words between Twitter profiles. In: Social Informatics (2017)
Google Scholar
Symeonidis, S., Effrosynidis, D., Kordonis, J., Arampatzis, A.: DUTH at SemEval-2017 Task 4: a voting classification approach for Twitter sentiment analysis. In: SemEval (2017)
Symeonidis, S., Kordonis, J., Effrosynidis, D., Arampatzis, A.: DUTH at SemEval-2017 Task 5: sentiment predictability in financial microblogging and news articles. In: SemEval (2017)
Tabari, N., Seyeditabari, A., Zadrozny, W.: SentiHeros at SemEval-2017 Task 5: an application of sentiment analysis on financial tweets. In: SemEval (2017)
Tan, C., Lee, L., Tang, J., Jiang, L., Zhou, M., Li, P.: User-level sentiment analysis incorporating social networks. In: SIGKDD (2011)
Tang, D., Wei, F., Qin, B., Liu, T., Zhou, M.: Coooolll: a deep learning system for Twitter sentiment classification. In: SemEval@COLING (2014)
Tang, D., Wei, F., Yang, N., Zhou, M., Liu, T., Qin, B.: Learning sentiment-specific word embedding for Twitter sentiment classification. In: ACL (2014)
Thelwall, M., Buckley, K., Paltoglou, G.: Sentiment strength detection for the social web. JASIST 63(1), 163–173 (2012)
Google Scholar
Topsy Analytics: Find the insights that matter. www.topsy.com (2014)
Turet, J.G., Costa, A.P.C.S.: Big data analytics to improve the decision-making process in public safety: a case study in Northeast Brazil. In: Springer ICDSST (2018)
Google Scholar
Tweet Complete Index. https://blog.twitter.com/engineering/en_us/a/2014/building-a-complete-tweet-index.html
TweetTracker: track, analyze, and understand activity on Twitter. tweettracker.fulton.asu.edu/ (2014)
Twitter and Informal Science Learning and Engagement. https://www.nsf.gov/awardsearch/showAward?AWD_ID=1438898 (2017)
The Power of Images: A Computational Investigation of Political Mobilization via Social Media. https://www.nsf.gov/awardsearch/showAward?AWD_ID=1727459 (2017)
Twitter Data Changing Future of Population Research. http://news.psu.edu/story/474782/2017/07/17/research/twitter-data-changing-future-population-research (2017)
Twitter Statistics. https://about.twitter.com/company (2018)
The Twitter War: Social Media’s Role in Ukraine Unrest. news.nationalgeographic.com/news/2014/05/140510-ukraine-odessa-russia-kiev-twitter-world/ (2014)
Twitter a Big Winner in 2012 Presidential Election. https://www.computerworld.com/article/2493332/social-media/twitter-a-big-winner-in-2012-presidential-election.html (2012)
Topsy Analytics for Twitter Political Index. https://blog.twitter.com/official/en_us/a/2012/a-new-barometer-for-the-election.html
Understanding Social and Geographical Disparities in Disaster Resilience Through the Use of Social Media. https://www.nsf.gov/awardsearch/showAward?AWD_ID=1620451 (2017)
Vesdapunt, N., Garcia-Molina, H.: Identifying users in social networks with limited information. In: ICDE (2015)
Vo, D.T., Zhang, Y.: Target-dependent Twitter sentiment classification with rich automatic features. In: IJCAI (2015)
VoltDB. https://www.voltdb.com/ (2018)
Vosecky, J., Jiang, D., Leung, K.W.-T., Xing, K., Ng, W.: Integrating social and auxiliary semantics for multifaceted topic modeling in Twitter. ACM TOIT 14(4), 271–2724 (2014)
Google Scholar
Vydiswaran, V.G.V., Romero, D.M., Zhao, X., Yu, D., Gomez-Lopez, I.N., Lu, J.X., Iott, B., Baylin, A., Clarke, P., Berrocal, V.J., et al.: “Bacon Bacon Bacon”: food-related tweets and sentiment in metro detroit. In: ICWSM (2018)
Wakamiya, S., Jatowt, A., Kawai, Y., Akiyama, T.: Analyzing global and pairwise collective spatial attention for geo-social event detection in microblogs. In: WWW Companion (2016)
Wang, M., Chu, B., Liu, Q., Zhou, X.: YNUDLG at SemEval-2017 Task 4: A GRU-SVM model for sentiment classification and quantification in Twitter. In: SemEval-2017 (2017)
Wang, X., Zhang, Y., Zhang, W., Lin, X., Wang, W.: AP-Tree: efficiently support continuous spatial-keyword queries over stream. In: ICDE (2015)
Wang, X., Wei, F., Liu, X., Zhou, M., Zhang, M.: Topic sentiment analysis in twitter: a graph-based hashtag sentiment classification approach. In: CIKM (2011)
Wang, Y., Liu, J., Huang, Y., Feng, X.: Using hashtag graph-based topic model to connect semantically-related words without co-occurrence in microblogs. TKDE 28(7), 1919–1933 (2016)
Google Scholar
Wang, Y., Siriaraya, P., Nakaoka, Y., Sakata, H., Kawai, Y., Akiyama, T.: A Twitter-based culture visualization system by analyzing multilingual geo-tagged tweets. In: ICADL (2018)
Wang, Z., Zhang, Y., Li, Y., Wang, Q., Xia, F.: Exploiting social influence for context-aware event recommendation in event-based social networks. In: INFOCOM (2017)
Watanabe, K., Ochi, M., Okabe, M., Onai, R.: Jasmine: a real-time local-event detection system based on geolocation information propagated to microblogs. In: CIKM (2011)
Weber, I., Garimella, V.R.K.: Visualizing user-defined, discriminative geo-temporal Twitter activity. In: ICWSM (2014)
Welch, M.J., Schonfeld, U., He, D., Cho, J.: Topical semantics of Twitter links. In: WSDM (2011)
Wu, F., Huang, Y.: Personalized microblog sentiment classification via multi-task learning. In: AAAI (2016)
Wu, S., Gong, L., Rand, W., Raschid, L.: Making recommendations in a microblog to improve the impact of a focal user. In: RecSys (2012)
Wu, X., Bartram, L., Shaw, C.: Plexus: an interactive visualization tool for analyzing public emotions from Twitter data. In: CoRR. arXiv:1701.06270 (2017)
Wu, Y.: Language E-learning based on learning analytics in big data era. In: International Conference on Big Data and Education (2018)
Xiang, B., Zhou, L.: Improving Twitter sentiment analysis with topic-based mixture modeling and semi-supervised training. In: ACL, vol. 2 (2014)
Xie, Q., Zhang, X., Zhixu, L., Zhou, X.: Optimizing cost of continuous overlapping queries over data streams by filter adaption. TKDE 28(5), 1258–1271 (2016)
Google Scholar
Xing, C., Wang, Y., Liu, J., Huang, Y., Ma, W.Y.: Hashtag-based sub-event discovery using mutually generative LDA in Twitter. In: AAAI, pp. 2666–2672 (2016)
Xiong, X., Mokbel, M.F., Aref, W.G.: SEA-CNN: scalable processing of continuous K-nearest neighbor queries in spatio-temporal databases. In: ICDE (2005)
Yang, T.H., Tseng, T.H., Chen, C.P.: deepSA at SemEval-2017 Task 4: interpolated deep neural networks for sentiment analysis in Twitter. In: SemEval (2017)
Yao, J., Cui, B., Xue, Z., Liu, Q.: Provenance-based indexing support in micro-blog platforms. In: ICDE (2012)
Yen, A.Z., Huang, H.H., Chen, H.H.: Detecting personal life events from Twitter by multi-task LSTM. In: WWW Companion (2018)
Yin, H., Cui, B., Chen, L., Hu, Z., Zhang, C.: Modeling location-based user rating profiles for personalized recommendation. TKDD 9(3), 191–1941 (2015)
Google Scholar
Yin, Y., Song, Y., Zhang, M.: NNEMBs at SemEval-2017 Task 4: neural Twitter sentiment classification: a simple ensemble method with different embeddings. In: SemEval (2017)
Yang, X.W., Yu, Z.: Xinjie: user embedding for scholarly microblog recommendation. In: ACL, vol. 2 (2016)
Zhiwen, Y., Wang, Z., Chen, L., Guo, B., Li, W.: Featuring, detecting, and visualizing human sentiment in Chinese micro-blog. TKDD 10(4), 48 (2016)
Google Scholar
Zayer, M.A., Gunes, M.H.: Analyzing the use of Twitter to disseminate visual impairments awareness information. In: ASONAM (2017)
Zhang, C., Lei, D., Yuan, Q., Zhuang, H., Kaplan, L., Wang, S., Han, J.: GeoBurst+: effective and real-time local event detection in geo-tagged tweet streams. ACM TIST 9(3), 34 (2018)
Google Scholar
Zhang, C., Liu, L., Lei, D., Yuan, Q., Zhuang, H., Hanratty, T., Han, J.: Triovecevent: embedding-based online local event detection in geo-tagged tweet streams. In: SIGKDD (2017)
Zhang, C., Zhou, G., Yuan, Q., Honglei Z., Yu., Z., Lance K., Wang, S., Han, J.: Geoburst: real-time local event detection in geo-tagged tweet streams. In: SIGIR (2016)
Zhang, D., Liu, Y., Lawrence, R.D., Chenthamarakshan, V.: Transfer latent semantic learning: microblog mining with less supervision. In: AAAI (2011)
Zhang, D., Chan, C.Y., Tan, K.L.: Processing spatial keyword query as a top-k aggregation query. In: SIGIR (2014)
Zhang, D., Nie, L., Luan, H., Tan, K.-L., Chua, T.-S., Shen, H.T.: Compact indexing and judicious searching for billion-scale microblog retrieval. ACM TOIS 35(3), 27 (2017)
Google Scholar
Zhang, D., Tan, K.L., Tung, A.K.H.: Scalable top-k spatial keyword search. In: EDBT (2013)
Zhang, H., Chen, G., Ooi, B.C., Wong, W.F., Wu, S., Xia, Y.: “Anti-caching”-based elastic memory management for big data. In: ICDE (2015)
Zhang, J., Zhang, R., Sun, J., Zhang, Y., Zhang, C.: TrueTop: a sybil-resilient system for user influence measurement on Twitter. IEEE/ACM TON 24(5), 2834–2846 (2016)
Google Scholar
Zhang, L., Ghosh, R., Dekhil, M., Hsu, M., Liu, B.: Combining lexicon-based and learning-based methods for Twitter sentiment analysis. HP Laboratories, Technical Report HPL-2011, p. 89 (2011)
Zhang, Y., Szabo, C., Sheng, Q.Z., Fang, X.S.: SNAF: observation filtering and location inference for event monitoring on Twitter. WWW J. 21(2), 311–343 (2018)
Google Scholar
Zhang, Y., Fan, Y., Ye, Y., Li, X., Winstanley, E.: Utilizing social media to combat opioid addiction epidemic: automatic detection of opioid users from Twitter. In: AAAI Workshops (2018)
Zhang, Z., Lan, M.: Estimating semantic similarity between expanded query and tweet content for microblog retrieval. In: TREC (2014)
Zhao, J., Lan, M., Zhu, T.: ECNU: expression-and message-level sentiment orientation classification in Twitter using multiple effective features. In: SemEval (2014)
Zhao, J., Gui, X., Tian, F.: A new method of identifying influential users in the micro-blog networks. IEEE Access 5, 3008–3015 (2017)
Google Scholar
Zhao, J., Lui, J.C.S., Towsley, D., Wang, P., Guan, X.: Sampling design on hybrid social-affiliation networks. In: ICDE (2015)
Zhao, L., Chen, F., Chang-Tien, L., Ramakrishnan, N.: Online spatial event forecasting in microblogs. ACM TSAS 2(4), 15 (2016)
Google Scholar
Zhao, W.X., Guo, Y., He, Y., Jiang, H., Wu, Y., Li, X.: We know what you want to buy: a demographic-based system for product recommendation on microblogs. In: KDD (2014)
Zhao, W.X., Sui, L., Yulan, H., Chang, E.Y., Ji-Rong, W., Li, X.: Connecting social media to e-commerce: cold-start product recommendation using microblogging information. TKDE 28(5), 1147–1159 (2016)
Google Scholar
Zheng, X., Sun, A., Wang, S., Han, J.: Semi-supervised event-related tweet identification with dynamic keyword generation. In: CIKM (2017)
Zhou, D., Chen, L., He, Y.: An unsupervised framework of exploring events on Twitter: filtering, extraction and categorization. In: AAAI (2015)
Zhou, D., Gao, T., He, Y.: Jointly event extraction and visualization on Twitter via probabilistic modelling. In: ACL, vol. 1 (2016)
Zhou, X., Chen, L.: Event detection over Twitter social media streams. PVLDB 23(3), 381–400 (2014)
MathSciNet Google Scholar
Zhou, Y., Cristea, A.I., Shi, L.: Connecting targets to tweets: semantic attention-based model for target-specific stance detection. In: WISE (2017)
Zhu, R., Wang, B., Yang, X., Zheng, B., Wang, G.: SAP: improving continuous top-K queries over streaming data. In: ICDE (2018)
Zhu, X., Huang, J., Zhu, S., Chen, M., Zhang, C., Li, Z., Dongchuan, H., Chengliang, Z., Li, A., Jia, Y.: NUDTSNA at TREC 2015 microblog track: a live retrieval system framework for social network based on semantic expansion and quality model. In: TREC (2015)
Zini, T., Becker, K., Dias, M.: INF-UFRGS at SemEval-2017 Task 5: a supervised identification of sentiment score in tweets and headlines. In: SemEval (2017)

Download references

Author information

Laila Abdelhafeez and Yunfan Kang have equal contributions and are ordered alphabetically.

Authors and Affiliations

Department of Computer Science and Engineering, University of California, Riverside, Riverside, CA, USA
Amr Magdy, Laila Abdelhafeez, Yunfan Kang & Eric Ong
Department of Computer Science and Engineering, University of Minnesota, Twin Cities, Minneapolis, MN, USA
Mohamed F. Mokbel

Authors

Amr Magdy
View author publications
Search author on:PubMed Google Scholar
Laila Abdelhafeez
View author publications
Search author on:PubMed Google Scholar
Yunfan Kang
View author publications
Search author on:PubMed Google Scholar
Eric Ong
View author publications
Search author on:PubMed Google Scholar
Mohamed F. Mokbel
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Amr Magdy.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work is partially supported by the National Science Foundation, USA, under Grants IIS-1849971, SES-1831615, and CNS-1837577.

Appendices

Appendix

Orthogonal research directions

This appendix gives an overview about sentiment and semantic analysis in microblogs as an example of an orthogonal research direction, from the natural language processing literature, that does not exploit much of the data management infrastructures. The appendix highlights the differences of new techniques on microlength data with the corresponding techniques on traditional long data. For detailed surveys about these topics, the reader can refer to [80, 117].

1.1 Sentiment analysis

Sentiment analysis automatically discovers the polarity of feelings expressed in a chunk of text, e.g., a citizen posts positive or negative opinions about certain election candidate. Traditional sentiment analysis techniques make use of the microblogs brevity to enhance the classification accuracy of user sentiment. As reported in [46], using a traditional sentiment classification technique on microblogs boosts the accuracy up to 10% higher for binary sentiment. This boost is an absolute advantage of the content brevity that makes it less confusing and more decisive to catch positive and negative feelings in user-generated content. However, microblogs brevity introduces both challenges and differences compared with traditional data. For example, feature extraction is more challenging due to lots of abbreviations and noise, e.g., extracting meaningful keywords is harder. In addition, compared with traditional data where sentiment is analyzed on three different level, document level, sentence level, and entity level, microblogs short content mostly limits the sentiment scope to a single sentence or a single entity that represents the whole microdocument. Moreover, microblogs come with additional advantageous features that were not available in traditional data, such as links, user information, and their interactions with different topics. Thus, the sentiment analysis research on microblogs has addressed a wider variety of challenges compared with traditional sentiment analysis. In this section, we give an overview about this rich literature.

Figure 15 depicts an overview of the microblogs sentiment analysis literature. The major techniques can be categorized into four main categories, namely, machine learning techniques, lexical techniques, hybrid techniques, and miscellaneous techniques. The machine learning techniques represent the majority of techniques in the literature. It could be further categorized into four sub-categories as depicted in Fig. 15, namely, supervised, classifier ensemble, deep learning, and semi-supervised. The first sub-category of techniques use supervised machine learning, i.e., traditional classifiers [4, 36, 67, 87, 202, 220, 254, 255, 276, 281, 318, 351, 393]. The differences among these techniques are the classifier type, stages, and features used to distinguish sentiment. The major used classifiers are support vector machines (SVM) [4, 5, 31, 35, 39, 83, 87, 120, 131, 160, 164, 172, 180, 248, 262, 275, 306], (multinomial) naive Bayes (MNB and NB) [31, 35, 120, 131, 247, 262], k-nearest neighbor (kNN) [31, 82], MaxEnt [92, 120, 180], random forest (RF) [89, 319], logistic regression [36, 202, 393], and AdaBoost [185]. The used features include different types of language-based features such as unigrams [5, 35, 120, 164, 247, 262], bigrams [35, 120, 262], trigrams [262], n-grams [82, 180, 185, 248], and POS tags [5, 39, 120, 180, 185, 247, 248, 262], microblog-specific features [185, 248] such as retweets [39], hashtags [35, 39, 164], emotions [35, 39, 164, 180], links [35, 39], and other features such as punctuation-based [5, 82, 180, 248], pattern-based [5, 35, 82, 164, 180, 248] and semantic-based [180, 247].

To enhance the classification accuracy, techniques of the second sub-category ensemble multiple classifiers [61, 70, 75, 79, 86, 136, 172, 185, 194, 208, 244, 266, 317, 363]. The set of used features is almost identical to the single classifier techniques, while the used classification algorithms are overlapping but not identical. In specific, SVM [79, 136, 266], NB [70, 136], MNB [79], logistic regression [79, 136, 208], and AdaBoost [185] are still used, while new classifiers are also introduced such as neural models [86, 136, 363] and Bayes network [136]. A third sub-category is deep learning, which is an emerging field in machine learning. In the past few years, deep learning is getting increasing popularity and many learning problems migrated to deep learning frameworks. Deep learning offers a black box of neural networks that are trained with huge amounts of data that offer better accuracy over traditional classifiers. In the case of microblog platforms, huge amount of data is generated daily, which has motivated the use of deep learning techniques for sentiment analysis on microblogs. Existing deep learning techniques is exploited in short textual contexts in two-step fashion [37, 62, 71, 86, 90, 134, 147, 148, 165, 265, 280, 288, 321, 322, 337, 342, 359]. It first learns word embeddings, and then, it applies them to produce representations for the text sentiment.

The main limitation of all supervised techniques, either with single classifier, multiple classifiers, or deep learning models, is the sensitivity to dataset size. For increasing their performance, there is a high reliance on the manually annotated labels which is extremely expensive. To alleviate this problem, distance supervision has been employed where the labels are generated based on the emoticons and hashtags [258, 310]. However, this approach did not perform well. This encouraged a fourth sub-category of semi-supervised techniques to rise. The semi-supervised techniques rely on both a small set of manually annotated data as well as unlabeled data to train the model. They can be further divided into three main types as depicted in Fig. 15: graph-based, wrapper-based, and topic-based techniques. The graph-based techniques [77, 313, 320, 344] use label propagation to label the unlabeled training data based on the similarity metric between two nodes in the graph. Then, a classifier is trained and used as previous techniques. The wrapper-based techniques [44, 45, 214, 215, 380] rely either on self-training [44, 45, 380] or co-training [214, 215]. In both types, the classification process is an iterative process, starting with the initial labeled data, classify the other unlabeled data, and use the high confident ones in the next iteration of the classification till all data is labeled or it hits the maximum number of iterations. The difference between the self-training and the co-training is that in self-training only one classifier is used, whereas in the co-training two classifiers with different feature sets are used to provide two different views for the data. The more confident classification within the two classifiers is chosen to be within the labeled data in the next iteration. The last semi-supervised types are topic-based techniques [11, 123, 139, 166, 241, 301, 355], where topic information is extracted with sentiment analysis simultaneously under the observation that the context of the content affects the sentiment.

The second major category in Fig. 15 is lexical techniques [30, 146, 152, 207, 260, 278, 299, 323, 340], where a predefined list of positive and negative words is employed to classify the sentiment of the new microblog. There are two main sub-categories in lexical techniques, namely dictionary-based and corpus-based. The dictionary-based techniques [30, 146, 152, 207, 299, 340] use dictionaries as lexical resources and approximate lexical matching techniques are used to account for microblogs noise and abbreviations. The corpus-based technique [278] uses statistical or semantic methods to match incoming data with existing lexical resources. The third major category in Fig. 15 is hybrid techniques [107, 115, 175, 177, 182, 189, 376] that combine both machine learning and lexical methods to detect microblogs sentiment. These techniques use lexical terms either to train a machine learning model or to filter data in a first stage that is fed to a classifier for further processing on a second stage.

Other miscellaneous techniques are proposed for microblogs sentiment analysis. ConSent [183] uses concept analysis to determine sentiment based on associated topic. AppSent [184] uses appraisal terms to outperform supervised techniques. SocioSent [150] uses sociological information in the supervised learning process to improve the performance. ChineSentiment [365] proposes a rule-based model for analyzing sentiment features of different linguistic components, and a corresponding methodology for calculating sentiment using emoticon elements as auxiliary affective factors.

1.2 Semantic analysis

Semantic analysis is a popular analysis task that is widely used in microblogs literature for different applications, such as topic modeling [339], knowledge extraction [309], community detection [190, 311], stance detection [390], sentiment analysis [182], event analysis [48, 261], effective microblog retrieval and ranking [379], and user recommendations [106]. This task automates discovering the meanings of a chunk of text by discovering semantic relationships that relate to real-world entities, such as places, persons, and organizations. For example, a text like Trump to campaign for Cindy Hyde-Smith in Mississippi can be related to two persons, Trump and Cindy Hyde-Smith, and one place, Mississippi. This relatedness connects the input text to a predefined set of semantic concepts or categories that are commonly extracted from human-contributed content, such as Wikipedia, or professionally maintained ontologies, such as FOAF and DBpedia ontologies. Such type of analysis used to be performed on long chunks of text, e.g., news articles, blog posts, or web documents. However, in microblogs, the textual content is very short and contains a plenty of abbreviations, informality, and noisy terms. Such brevity hurts the performance of traditional semantic analysis techniques, as shown in [242], that depend on lexical matching and search-based retrieval in, for example, Wikipedia concepts.

To overcome the brevity problem, a general theme of semantic analysis research on microblogs is exploring different ways to enrich the microblogs short textual content to enable accurate semantic relations discovery. Existing techniques in the literature can be categorized into four categories, as depicted in Fig. 16, based on the source of enrichment through: external documents-based techniques, machine learning-based techniques, hashtag-based techniques, and lexical techniques. Techniques of the first category [57, 113, 339, 350] depend on linking the microblog short document to external long documents, e.g., news articles or web documents, which allow traditional semantic analysis techniques to be applied with high precision. ToSem [339] performs semantic enrichment based on explicit web links that are included in the microblog to associate the linked web document. Then, it extracts both named entities and top-k terms from the web document to be appended to the microblog as auxiliary terms. NwSem [57] identifies online news articles that are related to the microblog post in order to extract named entities and include them in the user profile as semantic tags. UsrSem [350] explores semantics of user interactions, specifically retweets and links that are embedded in tweets, and their role in inferring notions such as quality of user relationships, trust, and other attributes of user relationships. This could be applied to re-ranking microblogs based on importance, user interest, quality, etc. DiSem [113] maps microblog posts to Wikipedia articles, then use the Wikipedia ontology for semantic categorization.

The second category is machine learning-based techniques [110, 149, 216, 217, 242, 287, 311, 370] that use either: (1) clustering to group different related microblogs and use their collective content to semantically label the whole cluster, or (2) classification that exploits annotated training data as an external source of information to learn different semantic classes of new microblogs. TrSem [370] introduces a novel transfer learning approach, namely transfer latent semantic learning, that utilizes a large number of tagged documents with rich information from domain-specific sources to discover latent semantics of the abbreviated text. AccSem [149] clusters related microblogs and use the collective content of each cluster to automatically assign semantically meaningful labels. The semantic labels are solicited from external knowledge sources, such as Wikipedia and WordNet, based on informative fragments parsed from microblogs contents. AdSem [242] uses SVM and naive Bayes classifiers to enhance the precision of mapping tweets to Wikipedia-based concepts. For this, it obtains an initial ranked list of candidate concepts through lexical matching, language modeling, and traditional techniques. Then, annotated training data is used to train classifiers that further classify microblogs based on different feature vectors to the correct semantic category, which significantly boosts both precision and recall. NomSem [216] uses SVM classifiers to identify nominal predicates in tweets. Then, a factor graph for each nominal predicate is constructed and joined with graphs of other predicates so their semantic arguments are jointly resolved. ComSem [311] clusters related microblogs to detect user groups within sub-communities. Then, a probabilistic model is employed to measure the semantic, or topical, coherence of the user group and filter out non-coherent groups. GeoSem [314] clusters microblogs based on spatial, temporal, and semantic features, including LDA topics, to evaluate the performance of combining different features in retrieving insights from microblogs data. ST-SRL [217] proposes a semi-supervised self-training approach that utilizes a small training dataset to label unlabeled tweets in an iterative way to increase the training dataset size. Labeled data records with highest confidence from two different labelers are used to enhance the classification accuracy in the following iterations. VecSem [110] has performed a unique study that explores the effect of changing microblog-specific semantic representation features on the performance of semantic prediction. It studies a set of 13 microblog-specific prediction tasks to understand both textual and social aspects of different representations.

The third category is hashtag-based techniques [38, 264, 345]. Hashtags are user-defined tags included in microblog posts, which indicate the discussed topics and enable posts related to the same topics to be searched easily. These hashtags are used in different ways to discover latent semantic content in microblogs. SMOB [264] uses hashtags as seeds to generate potential related links to web documents and ontology entries from both FOAF and DBpedia ontologies. Then, relevant semantic relations to the discovered entities are appended to the microblog. EntSem [38] enriches semantics through retrieving a ranked list of the top-k hashtags that are relevant to a user’s query and segments them into relevant individual words. Then, it retrieves a set of Wikipedia articles that are related to tweet text, hashtags, and segmented hashtags. HGTM [345] introduces a new topic model through using hashtags to determine semantic relatedness to each other through a graph structure. A graph of hashtag relatedness is constructed using probabilistic models; then, related hashtags are grouped in coherent topics.

The fourth category is lexical techniques [9, 10, 48, 81, 106, 179, 261, 273, 309, 392] that improve traditional techniques that are used for long text to be effective for short textual microblog content. InducSem [273] induces semantic entities using lexical pattern-based approach that match microblog text with seed keywords of each semantic category, e.g., food, sports, or vehicles. KnoSem [309] uses lexical resources that include corpus and POS-tagged terms to label tweets with semantic frames for knowledge extraction purposes. EveSem [261] analyzes word co-occurrences to discover relationships among word pairs. Then, such features are used to calculate the pairwise similarity of tweets for event detection purposes. HIVSem [10] uses lexical matching techniques to analysis the presence of an HIV prevention drug on Twitter. PlcSem [179] extracts place semantics through LDA topic modeling from a collection of microblogs to abstract their content through probabilistic models into a set of coherent topics. Then, the extracted place semantics is analyzed for temporal changes, e.g., a sports arena could evolve over time to be a place for concerts and exhibitions. MonSem [48] uses lexical matching to match microblog content with semantic knowledge bases to monitor unexpected events on social media. LikSem [9] uses semantic user attributes to enhance link prediction among social media users. RetSem [392] uses lexical semantic features to enhance microblogs retrieval performance. TriSem [81] uses semantic relevance to filter tweets based on Wikipedia concepts and trigrams. RecSem [106] uses semantic relatedness to recommend users to follow. It links users to Wikipedia through lexical and disambiguation algorithms; then, similar users are recommended.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Magdy, A., Abdelhafeez, L., Kang, Y. et al. Microblogs data management: a survey. The VLDB Journal 29, 177–216 (2020). https://doi.org/10.1007/s00778-019-00569-6

Download citation

Received: 03 January 2019
Revised: 07 April 2019
Accepted: 29 August 2019
Published: 18 September 2019
Issue Date: January 2020
DOI: https://doi.org/10.1007/s00778-019-00569-6

Keywords

Part of a collection:

Surveys

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Microblogs data management: a survey

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A personalized recommendation algorithm based on large-scale real micro-blog data

Research on Dissemination Value of Micro-Blog Information and Empirical Study

Graph Based Visualization of Large Scale Microblog Data

Explore related subjects

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix

Orthogonal research directions

1.1 Sentiment analysis

1.2 Semantic analysis

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now