Skip to main content
Log in

Toward a prediction approach based on deep learning in Big Data analytics

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Nowadays, cloud computing plays an important role in the process of storing both structured and unstructured data. This contributed to a very large data growth on web servers, which has come to be called Big Data. Cloud computing technology is adopted in many applications, perhaps the most important of which are social networking applications, e-mail messages, and others, which represent an important source of data through the process of communication between web users. Thus, these data represent views and opinions on various topics, which can help businesses and other decision makers in making decisions based on future predictions. To achieve this goal, several methods have been proposed. Recently, it relies on the use of deep learning as a tool for processing large volumes of data due to its high performance in extracting predictions from the opinions of web users. This paper presents a new prediction approach based on Big Data analysis and deep learning for large-scale data, called PABIDDL. The infrastructure of the proposed approach is focused on three important stages, starting with the reduction of Big Data based on MapReduce using the Hadoop framework. In the second stage, we performed the initialization of these data using the GloVe technique. Finally, the text data were classified into advantages and disadvantages poles depending on CNN deep learning approach. Also, we conducted an empirical study of our proposed approach PABIDDL and related works models on two standard datasets IMDB and MR datasets. The results obtained showed that the best performance is given by our approach. We recorded 0.93%, 0.90%, and 0.92% as an accuracy, a recall, and an F1-score, respectively. Furthermore, our approach reached the fastest response time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20

Similar content being viewed by others

Data availability

The dataset generated during the current study is available from the corresponding author on reasonable request.

Notes

  1. HDFS Architecture https://hadoop.apache.org.

References

  1. Aggarwal R, Verma J, Siwach M (2021) Small files’ problem in hadoop: a systematic literature review. In: Journal of King Saud University—Computer and Information Sciences

  2. Chaudhary K, Alam M, Al-Rakhami MS, Gumaei A (2021) Machine learning based mathematical modelling for prediction of social media consumer behaviour using big data analytics. J Big Data 8:73

    Article  Google Scholar 

  3. Chen J, Li K, Kashif Bilal X, Zhou KL, Philip SY (2019) A bi-layered parallel training architecture for large-scale convolutional neural networks. IEEE Trans Parallel Distrib Syst 30(5):965–976

    Article  Google Scholar 

  4. Duan M, Li K, Liao X, Li K (2018) A parallel multiclassification algorithm for big data using an extreme learning machine. IEEE Trans Neural Netw Learn Syst 29(6):2337–2351

    Article  MathSciNet  Google Scholar 

  5. Haddad O, Fkih F, Omri MN (2022) A survey on distributed frameworks for machine learning based big data analysis. SOMET 355:702–714

    Google Scholar 

  6. Haddad O, Fkih F, Omri MN (2022) Towards a novel approach for the forecasting process based deep learning. In: Parallelism in architecture, engineering and computing techniques, 4th edn (PACT 2022)

  7. Haddad O, Fkih F, Omri MN (2022) Towards an approach to optimizing latency in big data analytics. Int J Intell Syst Appl 14:1–13

  8. Hammond K, Varde AS (2013) Cloud based predictive analytics: text classification, recommender systems and decision support. In: 2013 IEEE 13th international conference on data mining workshops, pp 607–612

  9. Hassani H, Beneki C, Unger S, Mazinani MT, Yeganegi MR (2020) Text mining in big data analytics. Big Data Cogn Comput 4(1):1. https://doi.org/10.3390/bdcc4010001

  10. Jamsa K (2022) Cloud computing (1st. ed.). Jones and Bartlett Publishers, Inc, USA

  11. Kemp S (2022) [online] digital 2022: Global overview report

  12. Kumar V, Verma A, Mittal N, Gromov SV (2019) Anatomy of preprocessing of big data for monolingual corpora paraphrase extraction: source language sentence selection. In: Proceedings of IEMIS 2018, vol 3, pp. 495–505,01

  13. Levy O, Goldberg Y, Dagan I (2015) Improving distributional similarity with lessons learned from word embeddings. Trans Assoc Comput Linguist 3:211–225

    Article  Google Scholar 

  14. Liu B (2020) Text sentiment analysis based on cbow model and deep learning in big data environment. J Amb Intell Human Comput 11:451–458

    Article  Google Scholar 

  15. Mahmoud R, Belgacem S, Omri MN (2022) Towards an end-to-end isolated and continuous deep gesture recognition process. Neural Comput Appl 34:13713–13732

    Article  Google Scholar 

  16. Murshed BAH, Al-Ariki HDE, Mallappa S (2020) Semantic analysis techniques using twitter datasets on big data: comparative analysis study. Int J Comput Syst Sci Eng 35:495–512

    Google Scholar 

  17. Omri A, Omri MN (2022) Towards an efficient big data indexing approach under an uncertain environment. Int J Intell Syst Appl 14:1–13

    Google Scholar 

  18. Oo MCM, Thein T (2022) An efficient predictive analytics system for high dimensional big data. J King Saud Univ Comput Inf Sci 34(1):1521–1532

    Google Scholar 

  19. Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: onference on empirical methods in natural language processing (EMNLP), pp 1532–1543

  20. Bin P, Li K, Li S, Zhu N (2021) Automatic fetal ultrasound standard plane recognition based on deep learning and iiot. IEEE Trans Ind Inf 17(11):7771–7780

    Article  Google Scholar 

  21. Raj P (2018) Chapter seven—the hadoop ecosystem technologies and tools. In: Raj P, Deka GC (eds) A deep dive into NoSQL databases: the use cases and applications, volume 109 of advances in computers. Elsevier, pp 279–320

  22. Ramalingeswara RT, Mitra P, Bhatt R, Goswami A (2018) The big data system, components, tools and technologies: a survey. Knowl Inf Syst 09

  23. Sakr S, Liu A, Fayoumi AG (2013) The family of mapreduce and large-scale data processing systems. ACM Comput Surv 46(1):1–44

  24. Seyedan M, Mafakheri F (2020) Predictive big data analytics for supply chain demand forecasting: methods, applications, and research opportunities. J Big Data 53:1–22

  25. Souiden I, Brahmi Z, Omri MN (2022) Binary gravitational subspace search for outlier detection in high dimensional data streams. In: The 18th international conference on advanced data mining and applications (ADMA 2022)

  26. Souiden I, Brahmi Z, Omri MN (2022) A metaheuristic-based subspace search approach for outlier detection in high dimensional data streams. In: Parallelism in architecture, engineering and computing techniques, 4th edn. (PACT 2022)

  27. Souiden I, Omri MN, Brahmi Z (2022) A survey of outlier detection in high dimensional data streams. Comput Sci Rev 44:100463

    Article  MathSciNet  MATH  Google Scholar 

  28. Balaji TK, Annavarapu CSR, Bablani A (2021) Machine learning algorithms for social media analysis: a survey. Comput Sci Rev 40:100395

    Article  Google Scholar 

  29. Wang J, Yang Y, Wang T, Sherratt RS, Zhang J (2020) Big data service architecture: a survey. J Internet Technol 21:393–405

    Google Scholar 

  30. Wang J, Yang Y, Zhang J, Yu X, Alfarraj O, Tolba A (2020) A data-aware remote procedure call method for big data systems. Comput Syst Sci Eng 35:523–532

    Article  Google Scholar 

  31. White T (2015) Hadoop: the definitive guide. O’Reilly, 4th edn

  32. Xiao G, Li J, Chen Y, Li K (2020) Malfcs: an effective malware classification framework with automated feature extraction based on deep convolutional neural networks. J Parallel Distrib Comput 141:49–58

    Article  Google Scholar 

  33. Xiao G, Li K, Li K (2017) Reporting l most influential objects in uncertain databases based on probabilistic reverse top-k queries. Inf Sci 405:207–226

    Article  Google Scholar 

  34. Xiao G, Kenli LXZ, Keqin L (2017) Efficient monochromatic and bichromatic probabilistic reverse top-k query processing for uncertain big data. J Comput Syst Sci 89:92–113

    Article  MathSciNet  MATH  Google Scholar 

  35. Ye Z, Tafti AP, He KY, Wang K, He MM (2016) Sparktext: biomedical text mining on big data framework. PLoS ONE 11:1–15

  36. Zhang J, Zhong S, Wang T, Chao H-C, Wang J (2020) Blockchain-based systems and applications: a survey. J Internet Technol 21:1–14

    Google Scholar 

  37. Zhang Y, Wallace B (2014) A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification. In: Proceedings of the conference on empirical methods in natural language processing (EMNLP), pp 1746–1751

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Omar Haddad.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Haddad, O., Fkih, F. & Omri, M.N. Toward a prediction approach based on deep learning in Big Data analytics. Neural Comput & Applic 35, 6043–6063 (2023). https://doi.org/10.1007/s00521-022-07986-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-022-07986-9

Keywords

Navigation