Abstract
Social media analytics is a research axis focused on extracting useful insights from social media data, with the aim of helping individuals and organizations take the most optimum decisions regarding several disciplines of life (business, marketing, politics, health, etc.). In this respect, social networks, microblogging, and media-sharing websites represent striking instances of online social media, as constructed under the Web 2.0 associated technologies, targeted to promote the interaction between users and these websites, while shifting the user’s position from that of a mere consumer to that of a social data producer. Hence, huge amounts of social data turn out to be issued, thus turning into critical sources of Big Data. Actually, the traditional media analytical techniques seem obsolete and inadequate to process this huge array of unstructured social media and capture the massive data range, mainly the shifting from the batch scale to the streaming one. Such a process has culminated in injecting Big Data technologies throughout the analysis process. So, the present survey is targeted to help the concerned researchers identify the challenges encountered during the analysis process along with Big Data solutions. Indeed, the aim lies in providing a clear analytical process applicable with Big Data technologies. A systematic literature review is conducted to address the challenges facing integration of Big Data technologies, while displaying some adequate solutions. Following extensive literature search, an overall global view concerning the superposition of the social media analytics and Big Data technologies has been drawn and discussed, along with a promising potential research trend.
Similar content being viewed by others
Notes
References
Aasman J (2006) Allegro graph: RDF triple database. Oakland Franz Incorporated, Cidade
Abbasi A, Adjeroh DA, Dredze M, Paul MJ, Zahedi FM, Zhao H, Walia N et al (2014) Social media analytics for smart health. IEEE Intell Syst 29(2):60–80
Abramova V, Bernardino J (2013) NoSQL databases: MongoDB vs cassandra. In: Proceedings of the international C* conference on computer science and software engineering, ACM, pp 14–22
Achrekar H, Gandhe A, Lazarus R, Yu S-H, Liu B (2011) Predicting flu trends using twitter data. In: Computer Communications Workshops (INFOCOM WKSHPS), 2011 IEEE Conference on. IEEE, pp 702–707
Ackoff RL (1989) From data to wisdom. J Appl Syst Anal 16(1):3–9
Agrawal D, Bernstein P, Bertino E, Davidson S, Dayal U, Franklin M, Gehrke J, Haas L, Halevy A, Han J, Jagadish HV, Labrinidis A, Madden S, Papakonstantinou Y, Patel JM, Ramakrishnan R, Ross K, Shahabi C, Suciu D, Vaithyanathan S, Widom J (2012) Challenges and opportunities with big data—a community white paper developed by leading researchers across the United States. http://cra.org/ccc/docs/init/bigdatawhitepaper.pdf
Agrawal R, Kadadi A, Dai X, Andres F (2015) Challenges and opportunities with big data visualization. In: Proceedings of the 7th international conference on management of computational and collective intElligence in digital EcoSystems, ACM, pp 169–173
Ahamed BB, Ramkumar T, Hariharan S (2014) Data integration progression in large data source using mapping affinity. In: 7th International conference on advanced software engineering and its applications (ASEA), IEEE, pp 16–21
Ashwin KTK, Kammarpally P, George KM (2016) Veracity of information in twitter data: a case study. In: IEEE Computer Society BigComp, pp 129–136
Atikoglu B, Xu Y, Frachtenberg E, Jiang S, Paleczny M (2012) Workload analysis of a large-scale key-value store. In: Harrison PG, Arlitt MF, Casale G (eds) SIGMETRICS. ACM, New York, pp 53–64
Avvenuti M, Cresci S, Marchetti A, Meletti C, Tesconi M (2014) EARS (earthquake alert and report system): a real time decision support system for earthquake crisis management. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 1749–1758
Avvenuti M, Cresci S, Marchetti A, Meletti C, Tesconi M (2016) Predictability or early warning: using social media in modern emergency response. IEEE Internet Comput 20(6):4–6
Baquero AV, Palacios RC, Molloy O (2016) Real-time business activity monitoring and analysis of process performance on big-data domains. Telematics Inform 33(3):793–807
Baskar S, Arockiam L, Charles S (2013) A systematic approach on data pre-processing in data mining. Compusoft 2(11):335
Batrinca B, Treleaven PC (2015) Social media analytics: a survey of techniques, tools and platforms. AI Soc 30:89–116
Belcastro L, Marozzo F, Talia D (2018) Programming models and systems for Big Data analysis. Int J Parallel Emerg Distrib Syst. https://doi.org/10.1080/17445760.2017.1422501
Bermbach D, Müller S, Eberhardt J, Tai S (2015) Informed schema design for column store-based database services. In: SOCA, IEEE Computer Society, pp 163–172
Bhuta S, Doshi A, Doshi U, Narvekar M (2014) A review of techniques for sentiment analysis Of Twitter data. In: International conference on issues and challenges in intelligent computing techniques (ICICT), IEEE, pp. 583–591
Bocconi S, Bozzon A, Psyllidis A, Bolivar CT, Houben G-J (2015) Social glass: a platform for urban analytics and decision-making through heterogeneous social data. In: Gangemi A, Leonardi S, Panconesi A (eds) WWW (companion volume). ACM, New York, pp 175–178
Bohlouli M, Dalter J, Dornhöfer M, Zenkert J, Fathi M (2015) Knowledge discovery from social media using big data-provided sentiment analysis (SoMABiT). J Inf Sci 41(6):779–798
Bothos E, Apostolou D, Mentzas G (2010) Using social media to predict future events with agent-based markets. IEEE Intell Syst 25(6):50–58
Cambria E, Wang H, White B (2014) Guest editorial: big social data analysis. Knowl-Based Syst 69:1–2
Cao J, Chawla S, Wang Y, Wu H (2017) Programming platforms for Big Data analysis. In: Handbook of big data technologies. Springer, pp 65–99
Carlson JL (2013) Redis in action. Manning Publications Co., Shelter Island
Chang F, Dean J, Ghemawat S, Hsieh WC, Wallach DA, Burrows M, Chandra T et al (2008) Bigtable: a distributed storage system for structured data. ACM Trans Comput Syst (TOCS) 26(2):4
Chang RM, Kauffman RJ, Kwon Y (2014) Understanding the paradigm shift to computational social science in the presence of big data. Decis Support Syst 63:67–80
Chen CP, Zhang C-Y (2014) Data-intensive applications, challenges, techniques and technologies: a survey on Big Data. Inf Sci 275:314–347
Chen M, Ebert D, Hagen H, Laramee RS, Van Liere R, Ma K-L, Ribarsky W et al (2009) Data, information, and knowledge in visualization. IEEE Comput Gr Appl 29(1):1–10
Cheng X, Liu J, Dale C (2013) Understanding the characteristics of internet short video sharing: a YouTube-based measurement study. IEEE Trans Multimed 15(5):1184–1194
Ching A, Edunov S, Kabiljo M, Logothetis D, Muthukrishnan S (2015) One Trillion edges: graph processing at Facebook-scale. PVLDB 8:1804–1815
Chintapalli S, Dagit D, Evans B, Farivar R, Graves T, Holderbaugh M, Liu Z, Nusbaum K, Patil K, Peng B, Poulosky P (2016) Benchmarking streaming computation engines: storm, flink and spark streaming. In: IPDPS workshops, IEEE Computer Society, pp 1789–1792
Chodorow K (2013) MongoDB: the definitive guide. O”Reilly Media, Inc., Newton
Corbellini A, Mateos C, Zunino A, Godoy D, Schiaffino S (2017) Persisting big-data: the NoSQL landscape. Inf Syst 63:1–23
Cormode G, Krishnamurthy B (2008) Key differences between Web 1.0 and Web 2.0. First Monday 13(6)
Dang Y, Zhang Y, Hu PJ-H, Brown SA, Ku Y, Wang J-H, Chen H (2014) An integrated framework for analyzing multilingual content in Web 2.0 social media. Decis Support Syst 61:126–135
Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113
Dean J, Ghemawat S (2010) MapReduce: a flexible data processing tool. Commun ACM 53:72–77
Dredze M (2012) How social media will change public health. IEEE Intell Syst 27(4):81–84
Elgendy N, Elragal A (2014) Big data analytics: a literature review paper. In Perner P (eds) Advances in data mining. Applications and theoretical aspects. ICDM. Lecture notes in computer science, vol 8557. Springer, Cham
Esposito C, Ficco M, Palmieri F, Castiglione A (2015) A knowledge-based platform for Big Data analytics based on publish/subscribe services and stream processing. Knowl-Based Syst 79:3–17
Fan W, Bifet A (2013) Mining big data: current status, and forecast to the future. ACM SIGKDD Explor Newsl 14(2):1–5
Furht B, Villanustre F (2016) Introduction to Big Data. Big Data technologies and applications. Springer, Berlin, pp 3–11
Gandomi A, Haider M (2015) Beyond the hype: big data concepts, methods, and analytics. Int J Inf Manag 35(2):137–144
Auradkar A, Botev C, Das S, De Maagd D, Feinberg A, Ganti P, Gao L, et al. (2012) Data infrastructure at linkedin. In: IEEE 28th international conference on data engineering (ICDE), IEEE, pp 1370–1381
Ghemawat S, Gobioff H, Leung S-T (2003) The Google file system. ACM SIGOPS operating systems review, vol 37. ACM, New York, pp 29–43
Han J, Kamber M, Pei J (2011a) Data mining: concepts and techniques. Elsevier, Amsterdam
Han J, Haihong E, Le G, Du J (2011b) Survey on NoSQL database. In: 6th international conference on pervasive computing and applications (ICPCA), IEEE, pp 363–366
Haryadi AF, Hulstijn J, Wahyudi A, Voort H, van der, Janssen M (2016) Antecedents of big data quality: an empirical examination in financial service organizations. In: IEEE international conference on Big Data (Big Data), IEEE, pp 116–121
Hashem IAT, Yaqoob I, Anuar NB, Mokhtar S, Gani A, Khan SU (2015) The rise of “big data” on cloud computing: review and open research issues. Inf Syst 47:98–115
He W, Wang F-K, Akula V (2017) Managing extracted knowledge from big social media data for business decision making. J Knowl Manag 21(2):275–294
Hiba S, Mohamed Ali HT, Mohamed BA (2018) Popularity metrics’ normalization for social media entities. In: 20th International Conference on Enterprise Information Systems, pp 525–535
Hu H, Wen Y, Chua TS, Li X (2014) Toward scalable systems for big data analytics: a technology tutorial. IEEE Access 2:652–687
Imran M, Castillo C, Diaz F, Vieweg S (2015) Processing social media messages in mass emergency: a survey. ACM Comput Surv 47(4):67
Isard M, Budiu M, Yu Y, Birrell A, Fetterly D (2007) Dryad: distributed data-parallel programs from sequential building blocks. ACM SIGOPS operating systems review, ACM, vol 41, pp 59–72
Jagadish H, Gehrke J, Labrinidis A, Papakonstantinou Y, Patel JM, Ramakrishnan R, Shahabi C (2014) Big data and its technical challenges. Commun ACM 57(7):86–94
Ji X, Chun SA, Cappellari P, Geller J (2017) Linking and using social media data for enhancing public health analytics. J Inf Sci 43(2):221–245
Jure L (2011) Social media analytics: tracking, modeling and predicting the flow of information through networks. In: Proceedings of the 20th international conference companion on World wide web (WWW ‘11). ACM, New York, NY, USA, pp 277–278
Kaisler SH, Armour F, Espinosa JA, Money WH (2013) Big Data: issues and challenges moving forward. In: IEEE Computer Society HICSS, pp 995–1004
Kanhabua N, Romano S, Stewart A, Nejdl W (2012a) Supporting temporal analytics for health-related events in microblogs. In: Proceedings of the 21st ACM international conference on Information and knowledge management, CIKM’12, ACM, Maui, Hawaii, pp 2686–2688
Kaplan AM, Haenlein M (2010) Users of the world, unite! The challenges and opportunities of Social Media. Bus Horiz 53(1):59–68
Karpenko A, Aarabi P (2011) Tiny videos: a large data set for nonparametric video retrieval and frame classification. IEEE Trans Pattern Anal Mach Intell 33(3):618–630
Khan N, Yaqoob I, Hashem IAT, Inayat Z, Mahmoud Ali WK, Alam M, Shiraz M et al (2014) Big data: survey, technologies, opportunities, and challenges. Sci World J 2014:1–18
Kotsilieris T, Pavlaki A, Christopoulou SC, Anagnostopoulos I (2017) The impact of social networks on health care. Social Netw Anal Min 7(1):18:1–18:6
Kumar V, Chadha A (2012) Mining association rules in student’s assessment data. Int J Comput Sci Issues 9(5):211–216
Lennon, J. (2009). Introduction to couchdb. Beginning CouchDB, pp 3–9
Li N, Wu DD (2010) Using text mining and sentiment analysis for online forums hotspot detection and forecast. Decis Support Syst 48(2):354–368
Low Y, Bickson D, Gonzalez J, Guestrin C, Kyrola A, Hellerstein JM (2012) Distributed GraphLab: a framework for machine learning and data mining in the cloud. Proc VLDB Endow 5(8):716–727
Magnusson J (2012) Social network analysis utilizing Big Data Technology. https://www.diva-portal.org/smash/get/diva2:509757/FULLTEXT01.pdf
Malewicz G, Austern MH, Bik AJ, Dehnert JC, Horn I, Leiser N, Czajkowski G (2010) Pregel: a system for large-scale graph processing. In: Proceedings of the ACM SIGMOD international conference on management of data, ACM, pp 135–146
Manyika J, Chui M, Brown B, Bughin J, Dobbs R, Roxburgh C, Byers A (2011) Big Data: the next frontier for innovation, competition, and productivity
Mendoza M, Poblete B, Castillo C (2010) Twitter under crisis: can we trust what we RT? In: Giles CL, Mitra P, Perisic I, Yen J, Zhang H (eds) SOMA@KDD. ACM, New York, pp 71–79
Meng X, Bradley J, Yavuz B, Sparks E, Venkataraman S, Liu D, Freeman J et al (2016) Mllib: machine learning in apache spark. J Mach Learn Res 17(34):1–7
Middleton SE, Middleton L, Modafferi S (2014) Real-time crisis mapping of natural disasters using social media. IEEE Intell Syst 29(2):9–17
Mikolov T, Deoras A, Povey D, Burget L, Cernock J (2011) Strategies for training large scale neural network language models. In: IEEE Workshop on automatic speech recognition and understanding (ASRU), IEEE, pp 196–201
Neumeyer L, Robbins B, Nair A, Kesari A (2010) S4: distributed stream computing platform. In: IEEE international conference on data mining workshops (ICDMW), IEEE, pp 170–177
Newman R, Chang V, Walters RJ, Wills GB (2016) Web 2.0–the past and the future. Int J Inf Manag 36(4):591–598
Nguyen DT, Hwang D, Jung JJ (2014) Time-frequency social data analytics for understanding social big data. In: IDC, Studies in Computational Intelligence, vol 570. Springer, pp 223–228
Oh C, Sasser S, Almahmoud S (2015) Social media analytics framework: the case of Twitter and Super Bowl ads. J Inf Technol Manag 26(1):1–18
Olshannikova E, Ometov A, Koucheryavy Y, Olsson T (2016) Visualizing Big Data. In: Big Data technologies and applications, Springer, pp 101–131
Orgaz GB, Jung JJ, Camacho D (2016) Social big data: recent achievements and new challenges. Inf Fus 28:45–59
Oussous A, Benjelloun F-Z, Lahcen AA, Belfkih S (2017) Big Data technologies: a survey. J King Saud Univ Comput Inf Sci. https://doi.org/10.1016/j.jksuci.2017.06.001
Owen S, Owen S (2012) Mahout in action. Manning Publications Co., Shelter Island
Peng S, Wang G, Xie D (2017) Social influence analysis in social networking big data: opportunities and challenges. IEEE Netw 31(1):11–17
Radicati S, Hoang Q (2011) Email statistics report 2011–2015. The Radicati Group, Inc. A Technology Market Research Firm
Rahmani A, Chen AC-L, Sarhan A, Jida J, Rifaie M, Alhajj R (2014) Social media analysis and summarization for opinion mining: a business case study. Social Netw Anal Min 4(1):171
Reuter C, Scholl S (2014) Technical limitations for designing applications for social media. In: Butz A, Koch M, Schlichter JH (eds) Mensch & Computer workshop band. De Gruyter Oldenbourg, Berlin, pp 131–139
Rowley J (2007) The wisdom hierarchy: representations of the DIKW hierarchy. J Inf Sci 33(2):163–180
Sakaki T, Okazaki M, Matsuo Y (2013) Tweet analysis for real-time event detection and earthquake reporting system development. IEEE Trans Knowl Data Eng 25(4):919–931
Sakr S (2016) Large-scale graph processing systems. In: Big Data 2.0 Processing Systems: A Survey, Springer, Cham, pp 53–73
Santhanam T, Padmavathi M (2014) Comparison of K-means clustering and statistical outliers in reducing medical datasets. In: International conference on science engineering and management research (ICSEMR), IEEE, pp 1–6
Sapountzi A, Psannis KE (2016) Social networking data analysis tools & challenges. Future Gener Comput Sys. https://doi.org/10.1016/j.future.2016.10.019
Schroeck M, Shockley R, Smart J, Romero-Morales D, Tufano P (2012) Analytics: the real-world use of big data: How innovative enterprises extract value from uncertain data, Executive Report. In: IBM Institute for Business Value and Said Business School at the University of Oxford
Selvan LGS, Moh T-S (2015) A framework for fast-feedback opinion mining on Twitter data streams. In: CTS, IEEE, pp 314–318
Siddiqa A, Hashem IAT, Yaqoob I, Marjani M, Shamshirband S, Gani A, Nasaruddin F (2016) A survey of big data management: taxonomy and state-of-the-art. J Netw Comput Appl 71:151–166
Siddiqa A, Karim A, Gani A (2017) Big data storage technologies: a survey. Front IT & EE 18:1040–1070
Skoric MM, Poor ND, Achananuparp P, Lim E-P, Jiang J (2012) Tweets and votes: a study of the 2011 Singapore General Election. In: IEEE Computer Society, HICSS, pp 2583–2591
Stenmark D (2002) Information vs. knowledge: the role of intranets in knowledge management. In: Proceedings of HICSS. IEEE Press
Stieglitz S, Dang-Xuan L (2013) Social media and political communication: a social media analytics framework. Soc Netw Anal Min 3(4):1277–1291
Stieglitz S, Dang-Xuan L, Bruns A, Neuberger C (2014) Social media analytics. Wirtschaftsinformatik 56(2):101–109
Stieglitz S, Mirbabaie M, Ross B, Neuberger C (2018) Social media analytics—challenges in topic discovery, data collection, and data preparation. Int J Inf Manag 39:156–168
Storey VC, Song I-Y (2017) Big data technologies and management: what conceptual modeling can do. Data Knowl Eng 108:50–67
Strohbach M, Daubert J, Ravkin H, Lischka M (2016) Big data storage. In: New horizons for a data-driven economy, Springer, Cham, pp 119–141
Taylor RC (2010) An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics. BMC Bioinf 11(12):S1
Uddin MF, Gupta N et al. (2014) Seven V’s of Big Data understanding Big Data to extract value. In: American Society for Engineering Education (ASEE Zone 1), Zone 1 Conference of the IEEE, pp 1–5
Vatrapu R, Mukkamala RR, Hussain A, Flesch B (2016) Social set analysis: a set theoretical approach to big data analytics. IEEE Access 4:2542–2571
Vickery G, Wunsch-Vincent S (2007) Participative web and user-created content: Web 2.0 wikis and social networking. Organization for Economic Cooperation and Development (OECD)
Wang WY, Pauleen DJ, Zhang T (2016) How social media applications affect B2B communication and improve business performance in SMEs. Ind Mark Manag 54:4–14
Wang H, Xu Z, Pedrycz W (2017) An overview on the roles of fuzzy set techniques in big data processing: trends, challenges and opportunities. Knowl-Based Syst 118:15–30
White T (2012) Hadoop: the definitive guide. O”Reilly Media, Newton
Win SSM, Aung TN (2017) Target oriented tweets monitoring system during natural disasters. In: Uehara K, Nakamura M (eds) ICIS, IEEE Computer Society, pp 143–148
Wu Y, Cao N, Gotz D, Tan Y-P, Keim DA (2016) A survey on visual analytics of social media data. IEEE Trans Multimed 18:2135–2148
Wu D, Sakr S, Zhu L (2017) Big data storage and data models. In: Handbook of big data technologies, Springer, Cham, pp 3–29
Xin R, Rosen J, Zaharia M, Franklin MJ, Shenker S, Stoica I (2012) Shark: SQL and rich analytics at scale. CoRR. abs/1211.6176
Yaqoob I, Hashem IAT, Gani A, Mokhtar S, Ahmed E, Anuar NB, Vasilakos AV (2016) Big data: from beginning to future. Int J Inf Manag 6(6):1231–1247
Yaqub U, Chun SA, Atluri V, Vaidya J (2017) Sentiment based analysis of tweets during the US Presidential Elections. In: Hinnant CC, Ojo A (eds) DG.O, ACM, New York, pp 1–10
Zeng D, Chen H, Lusch R, Li S-H (2010) Social media analytics and intelligence. IEEE Intell Syst 25(6):13–16
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Sebei, H., Hadj Taieb, M.A. & Ben Aouicha, M. Review of social media analytics process and Big Data pipeline. Soc. Netw. Anal. Min. 8, 30 (2018). https://doi.org/10.1007/s13278-018-0507-0
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13278-018-0507-0