Abstract
The buildup of huge data within business intelligence is essential because such data includes complete conceptual and technological stack in addition to raw and processed data, data management, and analytics. Evaluating Data Quality Model Based-In-Use has gained more ground since business value could be only estimated in its used context. Despite the numerous data quality models used for regular data quality assessment, none of them have been amended to big data. For this reason, we propose four efficiencies and four metabolism processes as data quality indicators usable in big data researches. This model appropriately obtained the quality in use levels of the entry data for big data analytics, and those adequacies of Data Quality Model Based-In-Use levels could be comprehended as dependability indicators and adequacy of big data investigation. Besides, we have demonstrated the practical examples along with a proposed method, the stacked recurrent neural network for data quality assessment. Therefore, this model being independent of any pre-conditions or technologies could be integrated into various big data research.















Similar content being viewed by others
Change history
15 July 2024
This article has been retracted. Please see the Retraction Notice for more detail: https://doi.org/10.1007/s11227-024-06358-5
References
Abate ML, Diegert KV, Allen HW (1998) A hierarchical approach to improving data quality. Data Qual 4(1):365–369
Ardagna D, Cappiello C, Samá W, Vitali M (2018) Context-aware data quality assessment for big data. Future Gener Comput Syst 89:548–562
Arts DG, De Keizer NF, Scheffer G-J (2002) Defining and improving data quality in medical registries: a literature review, case study, and generic framework. J Am Med Inform Assoc 9(6):600–611
Becla J, Wang DL, Lim K-T (2012) Report from the 5th workshop on extremely large databases. Data Sci J 11:37–45
Betts J, Desaix P, Johnson E, Johnson J, Korol O, Kruse D, Poe B, Wise J, Womble M, Young K (2013) Anatomy & physiology. OpenStax College, Rice University, Houston
Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Electr Eng 40(1):16–28
Chang V (2014) The business intelligence as a service in the cloud. Future Gener Comput Syst 37:512–534
Chang WL, Fox G et al (2015) Nist big data interoperability framework: Volume 3, use cases and general requirements, Technical report
Chollet F (2018) Deep Learning mit Python und Keras: Das Praxis-Handbuch vom Entwickler der Keras-Bibliothek. MITP-Verlags GmbH & Co, Wachtendonk
Cuperlovic-Culf M (2018) Machine learning methods for analysis of metabolic data and metabolic pathway modeling. Metabolites 8(1):4
Deng L, Yu D et al (2014) Deep learning: methods and applications. Found Trends Sig Process 7(3–4):197–387
Domingos P (2012) A few useful things to know about machine learning. Commun ACM 55(10):78–87
Edition TE (2014) Anatomy and physiology. Volume 2 of 3, Lulu. com
Elgendy IA, El-kawkagy M, Keshk A (2015) An efficient framework to improve the performance of mobile applications. Int J Digit Content Technol Appl (JDCTA) 9(5):43–54
Elgendy I, Zhang W, Liu C, Hsu C-H (2018) An efficient and secured framework for mobile cloud computing. In: IEEE Transactions on Cloud Computing
Owner D (2017) Open food facts. https://www.kaggle.com/openfoodfacts/world-food-facts
Finch G, Davidson S, Kirschniak C, Weikersheimer M, Reese C, Shockley R (2014) Analytics: the speed advantage. IBM Institute for Business Value
For Standardization IO (1994) ISO 8402: 1994: quality management and quality assurance-vocabulary. In: International Organization for Standardization
Gandomi A, Haider M (2015) Beyond the hype: big data concepts, methods, and analytics. Int J Inf Manag 35(2):137–144
Géron A (2019) Hands-on machine learning with scikit-learn, keras, and tensorflow: concepts, tools, and techniques to build intelligent systems. O’Reilly Media, Newton
Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, Cambridge
Gubbi J, Buyya R, Marusic S, Palaniswami M (2013) Internet of things (iot): a vision, architectural elements, and future directions. Future Gener Comput Syst 29(7):1645–1660
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3(Mar):1157–1182
Han D-H, Zhang X, Wang G-R (2015) Classifying uncertain and evolving data streams with distributed extreme learning machine. J Comput Sci Technol 30(4):874–887
Hong C-G, Dietze C (2019) Enabling digital excellence through business process management and process frameworks. In: Krüssel P (ed) Future Telco. Springer, Berlin, pp 341–348
Iorga M, Feldman L, Barton R, Martin MJ, Goren NS, Mahmoudi C (2018) Fog computing conceptual model, Technical report
ISO I (2009) Iec 25012: 2008 software engineering-software product quality requirements and evaluation (square)-data quality model. International Organization for Standarization, Ginebra
Jin D-H, Kim H-J (2018) Integrated understanding of big data, big data analysis, and business intelligence: a case study of logistics. Sustainability 10(10):3778
Kahn BK, Strong DM, Wang RY (2002) Information quality benchmarks: product and service performance. Commun ACM 45(4):184–192
Kanehisa M, Goto S, Sato Y, Kawashima M, Furumichi M, Tanabe M (2014) Data, information, knowledge and principle: back to metabolism in KEGG. Nucleic Acids Res 42(D1):D199–D205
Karkouch A, Mousannif H, Al Moatassime H, Noel T (2016) Data quality in internet of things: a state-of-the-art survey. J Netw Comput Appl 73:57–81
Kwon O, Lee N, Shin B (2014) Data quality management, data usage experience and acquisition intention of big data analytics. Int J Inf Manag 34(3):387–394
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–44
Lee I (2017) Big data: dimensions, evolution, impacts, and challenges. Bus Horiz 60(3):293–303
Li H, Wu D, Li G-X, Ke Y-H, Liu W-J, Zheng Y-H, Lin X-L (2015) Enhancing telco service quality with big data enabled churn analysis: infrastructure, model, and deployment. J Comput Sci Technol 30(6):1201–1214
Li P, Li J, Huang Z, Li T, Gao C-Z, Yiu S-M, Chen K (2017) Multi-key privacy-preserving deep learning in cloud computing. Future Gener Comput Syst 74:76–85
Lilford R, Mohammed MA, Spiegelhalter D, Thomson R (2004) Use and misuse of process and outcome data in managing performance of acute medical care: avoiding institutional stigma. The Lancet 363(9415):1147–1154
Lin W, Wu Z, Lin L, Wen A, Li J (2017) An ensemble random forest algorithm for insurance big data analysis. IEEE Access 5:16568–16575
Loshin D (2013) Big data analytics: from strategic planning to enterprise integration with tools, techniques, NoSQL, and graph. Elsevier, Amsterdam
Mahanti R (2014) Critical success factors for implementing data profiling: the first step toward data quality. Softw Qual Prof 16(2):13
Mantha B (2014) Five guiding principles for realizing the promise of big data. Bus Intell J 19(1):8–11
McAfee A, Brynjolfsson E, Davenport TH, Patil D, Barton D (2012) Big data: the management revolution. Harv Bus Rev 90(10):60–68
Menshawy A (2018) Deep Learning By Example: a hands-on guide to implementing advanced machine learning algorithms and neural networks. Packt Publishing Ltd, Birmingham
Merino J, Caballero I, Rivas B, Serrano M, Piattini M (2016) A data quality in use model for big data. Future Gener Comput Syst 63:123–130
Miao X, Gao Y, Zhou L, Wang W, Li Q (2018) Optimizing quality for probabilistic skyline computation and probabilistic similarity search. IEEE Trans Knowl Data Eng 30(9):1741–1755
Millstein F (2018) Convolutional neural networks in python: Beginner’s guide to convolutional neural networks in python. CreateSpace Independent Publishing Platform
Muraoka K, Hanson P, Frank E, Jiang M, Chiu K, Hamilton D (2018) A data mining approach to evaluate suitability of dissolved oxygen sensor observations for lake metabolism analysis. Limnol Oceanogr Methods 16(11):787–801
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12(Oct):2825–2830
Perichappan KAP (2018) Greedy algorithm based deep learning strategy for user behavior prediction and decision making support. J Comput Commun 6(6):45–53
Ramsundar B, Zadeh RB (2018) Tensor flow for deep learning: from linear regression to reinforcement learning. O’Reilly Media Inc, Newton
Saggi MK, Jain S (2018) A survey towards an integration of big data analytics to big insights for value-creation. Inf Process Manag 54(5):758–790
Saladin KS (2004) Anatomy & physiology: the unity of form and function. McGraw-Hill, New York
Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117
Shiloach M, Frencher SK Jr, Steeger JE, Rowell KS, Bartzokis K, Tomeh MG, Richards KE, Ko CY, Hall BL (2010) Toward robust information: data quality and inter-rater reliability in the American College of Surgeons national surgical quality improvement program. J Am Coll Surg 210(1):6–16
Soares S (2012) Big data quality. In: Big Data Governance: An Emerging Imperative pp 110–112
Sun S, Cegielski CG, Jia L, Hall DJ (2018) Understanding the factors affecting the organizational adoption of big data. J Comput Inf Syst 58(3):193–203
Tortora G, Derrickson B (2017) Principles of anatomy and physiology. In: 15th edn. danvers, ma
Unsworth K, Adriasola E, Johnston-Billings A, Dmitrieva A, Hodkiewicz M (2011) Goal hierarchy: improving asset data quality by improving motivation. Reliab Eng Syst Saf 96(11):1474–1481
Wang C, Li X, Zhou X-H (2015) Crais: a crossbar-based interconnection scheme on FPGA for big data. J Comput Sci Technol 30(1):84–96
Wang DL, Becla J, Lim K-T (2013) Report from the 6th workshop on extremely large databases. Data Sci J 12:23–32
Wu X, Zhu X, Wu G-Q, Ding W (2013) Data mining with big data. IEEE Trans Knowl Data Eng 26(1):97–107
Zampieri G, Vijayakumar S, Yaneske E, Angione C (2019) Machine and deep learning meet genome-scale metabolic modeling. PLoS Comput Biol 15(7):e1007084
Zheng A, Casari A (2018) Feature engineering for machine learning: principles and techniques for data scientists. O’Reilly Media Inc, Newton
Acknowledgements
This paper was partially funded by the National Key R&D Program of China under Grant Nos. 2018YFB1004700, and NSFC Grant Nos. U1866602, 61602129, 61772157.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article has been retracted. Please see the retraction notice for more detail: https://doi.org/10.1007/s11227-024-06358-5"
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ngueilbaye, A., Wang, H., Khan, M. et al. RETRACTED ARTICLE: Adoption of human metabolic processes as Data Quality Based Models. J Supercomput 77, 1779–1817 (2021). https://doi.org/10.1007/s11227-020-03300-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-020-03300-3