Abstract
Nowadays, cloud computing plays an important role in the process of storing both structured and unstructured data. This contributed to a very large data growth on web servers, which has come to be called Big Data. Cloud computing technology is adopted in many applications, perhaps the most important of which are social networking applications, e-mail messages, and others, which represent an important source of data through the process of communication between web users. Thus, these data represent views and opinions on various topics, which can help businesses and other decision makers in making decisions based on future predictions. To achieve this goal, several methods have been proposed. Recently, it relies on the use of deep learning as a tool for processing large volumes of data due to its high performance in extracting predictions from the opinions of web users. This paper presents a new prediction approach based on Big Data analysis and deep learning for large-scale data, called PABIDDL. The infrastructure of the proposed approach is focused on three important stages, starting with the reduction of Big Data based on MapReduce using the Hadoop framework. In the second stage, we performed the initialization of these data using the GloVe technique. Finally, the text data were classified into advantages and disadvantages poles depending on CNN deep learning approach. Also, we conducted an empirical study of our proposed approach PABIDDL and related works models on two standard datasets IMDB and MR datasets. The results obtained showed that the best performance is given by our approach. We recorded 0.93%, 0.90%, and 0.92% as an accuracy, a recall, and an F1-score, respectively. Furthermore, our approach reached the fastest response time.




















Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
The dataset generated during the current study is available from the corresponding author on reasonable request.
Notes
HDFS Architecture https://hadoop.apache.org.
References
Aggarwal R, Verma J, Siwach M (2021) Small files’ problem in hadoop: a systematic literature review. In: Journal of King Saud University—Computer and Information Sciences
Chaudhary K, Alam M, Al-Rakhami MS, Gumaei A (2021) Machine learning based mathematical modelling for prediction of social media consumer behaviour using big data analytics. J Big Data 8:73
Chen J, Li K, Kashif Bilal X, Zhou KL, Philip SY (2019) A bi-layered parallel training architecture for large-scale convolutional neural networks. IEEE Trans Parallel Distrib Syst 30(5):965–976
Duan M, Li K, Liao X, Li K (2018) A parallel multiclassification algorithm for big data using an extreme learning machine. IEEE Trans Neural Netw Learn Syst 29(6):2337–2351
Haddad O, Fkih F, Omri MN (2022) A survey on distributed frameworks for machine learning based big data analysis. SOMET 355:702–714
Haddad O, Fkih F, Omri MN (2022) Towards a novel approach for the forecasting process based deep learning. In: Parallelism in architecture, engineering and computing techniques, 4th edn (PACT 2022)
Haddad O, Fkih F, Omri MN (2022) Towards an approach to optimizing latency in big data analytics. Int J Intell Syst Appl 14:1–13
Hammond K, Varde AS (2013) Cloud based predictive analytics: text classification, recommender systems and decision support. In: 2013 IEEE 13th international conference on data mining workshops, pp 607–612
Hassani H, Beneki C, Unger S, Mazinani MT, Yeganegi MR (2020) Text mining in big data analytics. Big Data Cogn Comput 4(1):1. https://doi.org/10.3390/bdcc4010001
Jamsa K (2022) Cloud computing (1st. ed.). Jones and Bartlett Publishers, Inc, USA
Kemp S (2022) [online] digital 2022: Global overview report
Kumar V, Verma A, Mittal N, Gromov SV (2019) Anatomy of preprocessing of big data for monolingual corpora paraphrase extraction: source language sentence selection. In: Proceedings of IEMIS 2018, vol 3, pp. 495–505,01
Levy O, Goldberg Y, Dagan I (2015) Improving distributional similarity with lessons learned from word embeddings. Trans Assoc Comput Linguist 3:211–225
Liu B (2020) Text sentiment analysis based on cbow model and deep learning in big data environment. J Amb Intell Human Comput 11:451–458
Mahmoud R, Belgacem S, Omri MN (2022) Towards an end-to-end isolated and continuous deep gesture recognition process. Neural Comput Appl 34:13713–13732
Murshed BAH, Al-Ariki HDE, Mallappa S (2020) Semantic analysis techniques using twitter datasets on big data: comparative analysis study. Int J Comput Syst Sci Eng 35:495–512
Omri A, Omri MN (2022) Towards an efficient big data indexing approach under an uncertain environment. Int J Intell Syst Appl 14:1–13
Oo MCM, Thein T (2022) An efficient predictive analytics system for high dimensional big data. J King Saud Univ Comput Inf Sci 34(1):1521–1532
Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: onference on empirical methods in natural language processing (EMNLP), pp 1532–1543
Bin P, Li K, Li S, Zhu N (2021) Automatic fetal ultrasound standard plane recognition based on deep learning and iiot. IEEE Trans Ind Inf 17(11):7771–7780
Raj P (2018) Chapter seven—the hadoop ecosystem technologies and tools. In: Raj P, Deka GC (eds) A deep dive into NoSQL databases: the use cases and applications, volume 109 of advances in computers. Elsevier, pp 279–320
Ramalingeswara RT, Mitra P, Bhatt R, Goswami A (2018) The big data system, components, tools and technologies: a survey. Knowl Inf Syst 09
Sakr S, Liu A, Fayoumi AG (2013) The family of mapreduce and large-scale data processing systems. ACM Comput Surv 46(1):1–44
Seyedan M, Mafakheri F (2020) Predictive big data analytics for supply chain demand forecasting: methods, applications, and research opportunities. J Big Data 53:1–22
Souiden I, Brahmi Z, Omri MN (2022) Binary gravitational subspace search for outlier detection in high dimensional data streams. In: The 18th international conference on advanced data mining and applications (ADMA 2022)
Souiden I, Brahmi Z, Omri MN (2022) A metaheuristic-based subspace search approach for outlier detection in high dimensional data streams. In: Parallelism in architecture, engineering and computing techniques, 4th edn. (PACT 2022)
Souiden I, Omri MN, Brahmi Z (2022) A survey of outlier detection in high dimensional data streams. Comput Sci Rev 44:100463
Balaji TK, Annavarapu CSR, Bablani A (2021) Machine learning algorithms for social media analysis: a survey. Comput Sci Rev 40:100395
Wang J, Yang Y, Wang T, Sherratt RS, Zhang J (2020) Big data service architecture: a survey. J Internet Technol 21:393–405
Wang J, Yang Y, Zhang J, Yu X, Alfarraj O, Tolba A (2020) A data-aware remote procedure call method for big data systems. Comput Syst Sci Eng 35:523–532
White T (2015) Hadoop: the definitive guide. O’Reilly, 4th edn
Xiao G, Li J, Chen Y, Li K (2020) Malfcs: an effective malware classification framework with automated feature extraction based on deep convolutional neural networks. J Parallel Distrib Comput 141:49–58
Xiao G, Li K, Li K (2017) Reporting l most influential objects in uncertain databases based on probabilistic reverse top-k queries. Inf Sci 405:207–226
Xiao G, Kenli LXZ, Keqin L (2017) Efficient monochromatic and bichromatic probabilistic reverse top-k query processing for uncertain big data. J Comput Syst Sci 89:92–113
Ye Z, Tafti AP, He KY, Wang K, He MM (2016) Sparktext: biomedical text mining on big data framework. PLoS ONE 11:1–15
Zhang J, Zhong S, Wang T, Chao H-C, Wang J (2020) Blockchain-based systems and applications: a survey. J Internet Technol 21:1–14
Zhang Y, Wallace B (2014) A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification. In: Proceedings of the conference on empirical methods in natural language processing (EMNLP), pp 1746–1751
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Haddad, O., Fkih, F. & Omri, M.N. Toward a prediction approach based on deep learning in Big Data analytics. Neural Comput & Applic 35, 6043–6063 (2023). https://doi.org/10.1007/s00521-022-07986-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-022-07986-9