Abstract
The term data quality refers to measuring the fitness of data regarding the intended usage. Poor data quality leads to inadequate, inconsistent, and erroneous decisions that could escalate the computational cost, cause a decline in profits, and cause customer churn. Thus, data quality is crucial for researchers and industry practitioners.
Different factors drive the assessment of data quality. Data context is deemed one of the key factors due to the contextual diversity of real-world use cases of various entities such as people and organizations. Data used in a specific context (e.g., an organization policy) may need to be more efficacious for another context. Hence, implementing a data quality assessment solution in different contexts is challenging.
Traditional technologies for data quality assessment reached the pinnacle of maturity. Existing solutions can solve most of the quality issues. The data context in these solutions is defined as validation rules applied within the ETL (extract, transform, load) process, i.e., the data warehousing process. In contrast to traditional data quality management, it is impossible to specify all the data semantics beforehand for big data. We need context-aware data quality rules to detect semantic errors in a massive amount of heterogeneous data generated at high speed. While many researchers tackle the quality issues of big data, they define the data context from a specific standpoint. Although data quality is a longstanding research issue in academia and industries, it remains an open issue, especially with the advent of big data, which has fostered the challenge of data quality assessment more than ever.
This article provides a scoping review to study the existing context-aware data quality assessment solutions, starting with the existing big data quality solutions in general and then covering context-aware solutions. The strength and weaknesses of such solutions are outlined and discussed. The survey showed that none of the existing data quality assessment solutions could guarantee context awareness with the ability to handle big data. Notably, each solution dealt only with a partial view of the context. We compared the existing quality models and solutions to reach a comprehensive view covering the aspects of context awareness when assessing data quality. This led us to a set of recommendations framed in a methodological framework shaping the design and implementation of any context-aware data quality service for big data. Open challenges are then identified and discussed.
- [1] . 2017. Data profiling: A tutorial. In Proceedings of the 2017 ACM International Conference on Management of Data (2017), 1747–1751.Google Scholar
- [2] . 2018. Data profiling. Synthes. Lect. Data Manag. 10, 4 (2018), 1–154.Google ScholarCross Ref
- [3] . 2013. Crowdsourcing linked data quality assessment. In The Semantic Web–ISWC 2013: 12th International Semantic Web Conference, Sydney, NSW, Australia, October 21–25, 2013, Proceedings, Part II 12. Springer, 260–276.Google ScholarDigital Library
- [4] . 2011. Challenges and Opportunities with Big Data [White Paper].
Technical Report . Computing Research Association. Retrieved from http://cra.org/ccc/docs/init/bigdatawhitepaper.pdf.Google Scholar - [5] . 2018. Service-oriented architecture for big data analytics in smart cities. In 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID’18). 633–640.Google Scholar
- [6] . 2019. IBRIDIA: A hybrid solution for processing big logistics data. Fut. Gen. Comput. Syst. 97 (2019), 792–804.Google ScholarDigital Library
- [7] . 2018. Context-aware data quality assessment for big data. Fut. Gen. Comput. Syst. 89 (2018), 548–562.Google ScholarDigital Library
- [8] . 2019. Improving the data quality in the research information systems. arXiv preprint arXiv:1901.07388 (2019).Google Scholar
- [9] . 2007. GrTP: Transformation based graphical tool building platform. In 10th International Conference on Model-driven Engineering Languages and Systems, Models.Google Scholar
- [10] . 2008. A comprehensive data quality methodology for web and structured data. Int. J. Innov. Comput. Applic. 1, 3 (2008), 205–218.Google ScholarDigital Library
- [11] . 2015. From data quality to big data quality. J. Datab. Manag. 26, 1 (2015), 60–82.Google ScholarDigital Library
- [12] . 2021. Cloud computing in construction industry: Use cases, benefits and challenges. Automat. Construct. 122 (2021), 103441.Google ScholarCross Ref
- [13] . 2011. Generic schema matching, ten years later. Proc. VLDB Endow. 4, 11 (2011), 695–701.Google ScholarDigital Library
- [14] . 2017. FiM: Performance prediction for parallel computation in iterative data processing applications. In IEEE 10th International Conference on Cloud Computing (CLOUD’17). 359–366.Google Scholar
- [15] . 2019. New performance modeling methods for parallel data processing applications. ACM Trans. Model. Comput. Simul. 29, 3 (2019), 1–24.Google ScholarDigital Library
- [16] . 2017. Domain-specific characteristics of data quality. Federated Conference on Computer Science and Information Systems (FedCSIS’17). 999–1003.Google Scholar
- [17] . 2018. Models of data quality. In Information Technology for Management. Ongoing Research and Development: 15th Conference, AITM 2017, and 12th Conference, ISM 2017, Held as Part of FedCSIS, Prague, Czech Republic, September 3–6, 2017, Extended Selected Papers 15. Springer, 194–211.Google ScholarCross Ref
- [18] . 2017. Executable data quality models. Procedia Comput. Sci. 104 (2017), 138–145.Google ScholarDigital Library
- [19] . 2018. An approach to data quality evaluation. In Fifth International Conference on Social Networks Analysis, Management and Security (SNAMS’18). 196–201.Google Scholar
- [20] . 2018. What Is Data Sampling? Retrieved from https://www.techtarget.com/searchbusinessanalytics/definition/data-sampling.Google Scholar
- [21] . 2018. Operational measurement of data quality. In Information Processing and Management of Uncertainty in Knowledge-Based Systems. Applications: 17th International Conference, IPMU 2018, Cádiz, Spain, June 11–15, 2018, Proceedings, Part III 17. Springer, 517–528.Google ScholarCross Ref
- [22] . 2009. Using ontologies providing domain knowledge for data quality management. Networked Knowledge-Networked Media: Integrating Knowledge Management, New Media Technologies and Semantic Systems. Springer, 187–203.Google ScholarCross Ref
- [23] . 2010. Data provenance–The foundation of data quality. In Workshop: Issues and Opportunities for Improving the Quality and Use of Data within the DoD, Arlington, 26–28.Google Scholar
- [24] . 2015. The challenges of data quality and data quality assessment in the big data era. Data Sci. J. 14 (2015).Google ScholarCross Ref
- [25] . 2011. A data quality methodology for heterogeneous data. Int. J. Datab. Manag. Syst. 3, 1 (2011), 60–79.Google Scholar
- [26] . 2008. An efficient method of data quality using quality evaluation ontology. 2008 Third International Conference on Convergence and Hybrid Information Technology 2 (2008), 1058–1061.Google Scholar
- [27] . 2019. An overview of data quality frameworks. IEEE Access 7 (2019), 24634–24648.Google ScholarCross Ref
- [28] . 2014. Quality Factors in Big Data and Big Data Analytics. Xamax Consultancy Pty Ltd.Google Scholar
- [29] . 2014. Sampling for big data: A tutorial. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1975–1975.Google Scholar
- [30] . 2013. Data Quality Services. Retrieved from https://docs.microsoft.com/en-us/sql/data-quality-services/data-quality-services?view=sql-server-ver15.Google Scholar
- [31] . 2018. SQL Server Integration Services. Retrieved from https://docs.microsoft.com/en-us/sql/integration-services/sql-server-integration-services?view=sql-server-ver15.Google Scholar
- [32] . 2013. Comprehensive Data Quality with Oracle Data Integrator and Oracle Enterprise Data Quality [White Paper].
Technical Report . Oracle Corporation. Retrieved from https://www.oracle.com/technetwork/middleware/data-integrator/overview/oracledi-comprehensive-quality-131748.pdf.Google Scholar - [33] . 2016. Data profiling technology of data governance regarding big data: Review and rethinking. In Information Technology: New Generations: 13th International Conference on Information Technology. Springer, 439–450.Google ScholarCross Ref
- [34] . 2018. Improving data quality through deep learning and statistical models. In Information Technology-New Generations: 14th International Conference on Information Technology. 515–522.Google Scholar
- [35] . 2017. Big Data management in smart grid: Concepts, requirements and implementation. J. Big Data 4, 1 (2017), 1–19.Google ScholarCross Ref
- [36] . 2008. MapReduce: simplified data processing on large clusters. Commun. ACM 51, 1, 107–113.Google ScholarDigital Library
- [37] . 2019. In search of big medical data integration solutions—A comprehensive survey. IEEE Access 7 (2019), 91265–91290.Google ScholarCross Ref
- [38] . 2015. Int. J. Inf. Syst. Proj. Manag. 3, 3 (2015), 49–63.Google Scholar
- [39] . 2013. Data fusion: Resolving conflicts from multiple sources. Handbook of Data Quality: Research and Practice. Springer, 293–318.Google ScholarCross Ref
- [40] . 2013. Big data integration. In IEEE 29th International Conference on Data Engineering (ICDE’13). IEEE, 1245–1248.Google ScholarDigital Library
- [41] . 2018. Microservices: How to make your application scale. In Perspectives of System Informatics: 11th International Andrei P. Ershov Informatics Conference, PSI 2017, Moscow, Russia, June 27–29, 2017, Revised Selected Papers 11. Springer, 95–104.Google ScholarCross Ref
- [42] . 2018. Importance of MapReduce for big data applications: A survey. Asian J. Comput. Sci. Technol. 7, 1 (2018), 112–118.Google ScholarCross Ref
- [43] . 2018. Automated continuous data quality measurement with QuaIIe. Int. J. Advanc. Softw. 11, 3 (2018), 400–417.Google Scholar
- [44] . 2018. QuaIIe: A data quality assessment tool for integrated information systems. In 10th International Conference on Advances in Databases, Knowledge, and Data Applications (DBKDA’18). 21–31.Google Scholar
- [45] . 2017. Automated data quality monitoring. In 22nd MIT International Conference on Information Quality (ICIQ’17). 15–1.Google Scholar
- [46] . 2005. Value-driven data quality assessment. In International Conference on Information Quality (ICIQ’05).Google Scholar
- [47] . 2007. Utility-driven assessment of data quality. ACM SIGMIS Datab.: DATAB. Adv. Inf. Syst. 38, 2 (2007), 75–93.Google ScholarDigital Library
- [48] . 2019. ORADIEX: A big data driven smart framework for real-time surveillance and analysis of individual exposure to radioactive pollution. In International Conference on Big Data and Cybersecurity Intelligence (BDCSIntell’19). 52–56.Google Scholar
- [49] . 2018. RaDEn: A scalable and efficient radiation data engineering. In International Conference on Big Data and Cybersecurity Intelligence (BDCSIntell’18). 89–93.Google Scholar
- [50] . 2013. Assessing internet video quality using crowdsourcing. In 2nd ACM International Workshop on Crowdsourcing for Multimedia. 23–28.Google Scholar
- [51] . 2005. Incorporating non-local information into information extraction systems by Gibbs sampling. In 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05). 363–370.Google Scholar
- [52] . 2016. Big data validation and quality assuranceIssues, challenges, and needs. In IEEE symposium on service-oriented system engineering (SOSE16). 433–441.Google Scholar
- [53] . 2007. A review of information quality research-develop a research agenda. In International Conference on Information Quality (ICIQ’07). 76–91.Google Scholar
- [54] . 2021. SparkDQ: Efficient generic big data quality management on distributed data-parallel computation. J. ParallelDistrib. Comput. 156 (2021), 132–147.Google ScholarCross Ref
- [55] . 2017. Data quality considerations for big data and machine learning: Going beyond data cleaning and transformations. Int. J. Advanc. Softw. 10, 1 (2017), 1–20.Google Scholar
- [56] . 2016. Data quality centric application framework for big data. In International Conference on Big Data, Small Data, Linked Data and Open Data (ALLDATA’16).Google Scholar
- [57] . 2019. Uncertainty in big data analytics: Survey, opportunities, and challenges. J. Big Data 6, 1 (2019), 1–16.Google ScholarCross Ref
- [58] . 2016. Microservices for scalability: Keynote talk abstract. In Proceedings of the 7th ACM/SPEC on International Conference on Performance Engineering. 133–134.Google Scholar
- [59] . 2011. Storm clouds rising: Security challenges for IaaS cloud computing. In 2011 44th Hawaii International Conference on System Sciences. 1–7.Google Scholar
- [60] . 2010. Data deduplication techniques. In 2010 International Conference on Future Information Technology and Management Engineering 1 (2010), 430–433.Google Scholar
- [61] . 2015. Parallel sampling from big data with uncertainty distribution. Fuzzy Sets Syst. 258 (2015), 117–133.Google ScholarDigital Library
- [62] . 2009. A context aware information quality framework. In 2009 Fourth International Conference on Cooperation and Promotion of Information Resources in Science and Technology. 187–193.Google Scholar
- [63] . 2021. Knowledge graphs. ACM Comput. Surv. 54, 4 (2021), 1–37.Google ScholarDigital Library
- [64] . 2020. DeezyMatch: A flexible deep learning approach to fuzzy string matching. In Conference on Empirical Methods in Natural Language Processing: System Demonstrations. 62–69.Google ScholarCross Ref
- [65] . 2014. Survey of web-based crowdsourcing frameworks for subjective quality assessment. In IEEE 16th International Workshop on Multimedia Signal Processing (MMSP’14). 1–6.Google Scholar
- [66] . 2019. Data Cleaning. ACM New York, NY.Google ScholarDigital Library
- [67] . 2015. Evaluating the quality of social media data in big data architecture. IEEE Access 3 (2015), 2028–2043.Google ScholarCross Ref
- [68] 2022. Data Quality and Machine Learning: What’s the Connection? Retrieved from https://www.talend.com/resources/machine-learning-data-quality/.Google Scholar
- [69] . 2018. Informatica Data Quality Data Sheet.
Technical Report . Informatica. Retrieved from https://www.informatica.com/content/dam/informatica-com/en/collateral/data-sheet/en_informatica-data-quality_data-sheet_6710.pdf.Google Scholar - [70] . 2015. Big data analysis: Apache Storm perspective. Int. J. Comput. Trends Technol. 19, 1 (2015), 9–14.Google ScholarCross Ref
- [71] . 2001. ISO/IEC 9126-1:2001. Software Engineering – Product Quality – Part 1: Quality Model.
Standard . ISO/IEC. Retrieved from https://www.iso.org/standard/22749.html.Google Scholar - [72] . 2008. 25012:2008 Software Engineering – Software Product Quality Requirements and Evaluation (SQuaRE) – Data Quality Model.
Standard . ISO/IEC. Retrieved from https://www.iso.org/standard/35736.html.Google Scholar - [73] . 2014. ISO/IEC 25000:2014. Systems and Software Engineering – System and Software Quality Requirements and Evaluation (SQuaRE) – Guide to SQuaRE.
Standard . ISO/IEC. Retrieved from https://www.iso.org/standard/64764.html.Google Scholar - [74] . 2015. ISO/IEC 25024:2015 Systems and Software Engineering – Systems and Software Quality Requirements and Evaluation (SQuaRE) – Measurement of Data Quality.
Standard . ISO/IEC. Retrieved from https://www.iso.org/standard/35749.html.Google Scholar - [75] . 2017. ISO/IEC 15939:2017 Systems and Software Engineering – Measurement Process.
Standard . ISO/IEC. Retrieved from https://www.iso.org/standard/71197.html.Google Scholar - [76] . 2020. ISO/IEC 20547-3:2020 Big Data Reference Architecture - Part 3: Reference Architecture.
Standard . ISO/IEC. Retrieved from https://www.iso.org/standard/71277.html.Google Scholar - [77] . 2022. ISO/IEC AWI 5259-1 Artificial Intelligence – Data Quality for Analytics and Machine Learning (ML) – Part 1: Overview, Terminology, and Examples.
Standard . ISO/IEC. Retrieved from https://www.iso.org/standard/81088.html.Google Scholar - [78] . 2011. ISO/TS 8000-1:2011 - Data Quality - Part 1: Overview.
Standard . ISO/TS. Retrieved from https://www.iso.org/standard/50798.html.Google Scholar - [79] . 1999. Statistical prediction of task execution times through analytic benchmarking for scheduling in a heterogeneous environment. In Proceedings Eighth Heterogeneous Computing Workshop (HCW’99). 99–111.Google Scholar
- [80] . 2012. Big data processing in cloud computing environments. In 2012 12th International Symposium on Pervasive Systems, Algorithms and Networks (2012), 17–23.Google Scholar
- [81] . 2014. Challenges of data integration and interoperability in big data. In 2014 IEEE International Conference on Big Data (big data) (2014), 38–40.Google Scholar
- [82] . 2014. Dealing with missing values in data. J. Syst. Integr. 5, 1 (2014) 42–51.Google Scholar
- [83] . 2015. A fuzzy approach model for uncovering hidden latent semantic structure in medical text collections. In iConference 2015.Google Scholar
- [84] . 2022. Cloud computing application: Research challenges and opportunity. In International Conference on Sustainable Computing and Data Communication Systems (ICSCDS’22). IEEE, 1284–1289.Google ScholarCross Ref
- [85] . 2015. BigDansing: A system for big data cleansing. In SIGMOD Conference.Google ScholarDigital Library
- [86] . 2019. Sampling techniques for big data analysis. Int. Statist. Rev. 87 (2019), S177–S191.Google ScholarCross Ref
- [87] . 2013. TripleCheckMate: A tool for crowdsourcing the quality assessment of linked data. In Knowledge Engineering and the Semantic Web: 4th International Conference, KESW 2013, St. Petersburg, Russia, October 7–9, 2013. Proceedings 4. Springer, 265–272.Google ScholarCross Ref
- [88] . 2021. Classification of imbalanced data: Review of methods and applications. IOP Conference Series: Materials Science and Engineering 1099, 1 (2021), 012077.Google Scholar
- [89] . 2016. Data profiling for data quality improvement with OpenRefine. In International Conference on Information Technology Systems and Innovation (ICITSI’16). 1–6.Google Scholar
- [90] . 2001. Quality metrics for intranet applications. Inf. Manag. 38, 3 (2001), 137–152.Google ScholarCross Ref
- [91] . 2020. Sampling for big data profiling: A survey. IEEE Access 8 (2020), 72713–72726.Google ScholarCross Ref
- [92] . 2017. Machine learning with big data: Challenges and approaches. IEEE Access 5 (2017), 7776–7797.Google ScholarCross Ref
- [93] . 2015. A survey and comparative study of data deduplication techniques. In International Conference on Pervasive Computing (ICPC’15). 1–5.Google Scholar
- [94] . 2016. The Challenges of Data Cleansing with Data Warehouses. 77–82.
DOI: Google ScholarCross Ref - [95] . 2021. Security challenges and solutions using healthcare cloud computing. J. Med. Life 14, 4 (2021), 448.Google ScholarCross Ref
- [96] . 2016. A data quality in use model for big data. Fut. Gen. Comput. Syst. 63 (2016), 123–130.Google ScholarDigital Library
- [97] . 2017. A linked data profiling service for quality assessment. In The Semantic Web: ESWC 2017 Satellite Events: ESWC 2017 Satellite Events, Portorož, Slovenia, May 28–June 1, 2017, Revised Selected Papers 14. Springer, 335–340.Google Scholar
- [98] . 2006. Quality views: Capturing and exploiting the user perspective on data quality. In International Conference on Very Large Data Bases.Google Scholar
- [99] . 2014. From big data to big projects: A step-by-step roadmap. In 2014 International Conference on Future Internet of Things and Cloud. 373–378.Google Scholar
- [100] . 2018. Systematic review or scoping review? Guidance for authors when choosing between a systematic or scoping review approach. BMC Med. Res. Methodol. 18 (2018), 1–7.Google ScholarCross Ref
- [101] . 2019. An automated big data accuracy assessment tool. In IEEE 4th International Conference on Big Data Analytics (ICBDA’19). 193–197.Google Scholar
- [102] . 2019. Assessing context-aware data consistency. In IEEE/ACS 16th International Conference on Computer Systems and Applications (AICCSA’19). 1–6.Google Scholar
- [103] . 2015. Deep learning applications and challenges in big data analytics. J. Big Data 2, 1 (2015), 1–21.Google ScholarCross Ref
- [104] . 2019. Data lake management: Challenges and opportunities. Proc. VLDB Endow. 12, 12 (2019), 1986–1989.Google ScholarDigital Library
- [105] . 2014. Data profiling revisited. ACM SIGMOD Rec. 42, 4 (2014), 40–49.Google ScholarDigital Library
- [106] . 2008. Modeling quality attribute variability. In International Conference on Evaluation of Novel Approaches to Software Engineering (ENASE’08). 169–176.Google Scholar
- [107] . 2019. An extended data object-driven approach to data quality evaluation: Contextual data quality analysis. In International Conference on Enterprise Information Systems (ICEIS’19). 274–281.Google ScholarCross Ref
- [108] . 2020. User-oriented approach to data quality evaluation. J. Univers. Comput. Sci. 26, 1 (2020), 107–126.Google ScholarCross Ref
- [109] . 2015. Reference architecture and classification of technologies, products and services for big data systems. Big Data Res. 2, 4 (2015), 166–186.Google ScholarDigital Library
- [110] . 2015. Towards large-scale schema and ontology matching. Retrieved from https://www.semanticscholar.org/paper/Towards-Large-scale-Schema-And-Ontology-Matching-Patel-Schneider/ceee2bdaef83a0f09480fa6fb191cf3372137152.Google Scholar
- [111] . 2018. A systematic review of provenance systems. Knowl. Inf. Syst. 57 (2018), 495–543.Google ScholarDigital Library
- [112] . 2002. Data quality assessment. Commun. ACM 45, 4 (2002), 211–218.Google ScholarDigital Library
- [113] . 2008. Developing a measurement instrument for subjective aspects of information quality. Commun. Assoc. Inf. Syst. 22, 1 (2008), 3.Google Scholar
- [114] . 2019. Data cleaning mechanism for big data and cloud computing. In 6th International Conference on Computing for Sustainable Global Development (INDIACom’19). 195–198.Google Scholar
- [115] . 2013. Towards a quality-centric big data architecture for federated sensor services. In 2013 IEEE International Congress on Big Data. 86–93.Google Scholar
- [116] . 2021. Big data: Big data analysis, issues and challenges and technologies. IOP Conference Series: Materials Science and Engineering 1022, 1 (2021), 012014.Google Scholar
- [117] . 2020. Sampling based join-aggregate query processing technique for big data. Indian J. Comput. Sci. Eng. 11, 5, 532–546.Google ScholarCross Ref
- [118] . 2014. Data quality: The other face of big data. In 2014 IEEE 30th International Conference on Data Engineering. 1294–1297.Google Scholar
- [119] . 2018. Automating large-scale data quality verification. Proc. VLDB Endow. 11, 12 (2018), 1781–1794.Google ScholarDigital Library
- [120] . 2021. Data Quality. Retrieved from https://www.computer.org/publications/tech-news/trends/big-data-and-cloud-computing.Google Scholar
- [121] . 2015. Schema matching bibtex. In Proceedings of the VLDB Endowment.Google Scholar
- [122] . 2022. ISO/IEC 25012. Retrieved from https://iso25000.com/index.php/en/iso-25000-standards/iso-25012.Google Scholar
- [123] . 2015. Processing big trajectory and Twitter data streams using Apache STORM. (2015), 301–304. Retrieved from https://www.semanticscholar.org/paper/Schema-Matching-Bibtex-Siegmund-Rosenm%C3%BCller/a4d94ddaab429e5874386dd29822e470b57d6ee4.Google Scholar
- [124] . 1997. Data quality in context. Commun. ACM 40, 5 (1997), 103–110.Google ScholarDigital Library
- [125] . 2016. A context-aware analytics for processing tweets and analysing sentiment in realtime (short paper). In On the Move to Meaningful Internet Systems: OTM 2016 Conferences: Confederated International Conferences: CoopIS, C&TC, and ODBASE 2016, Rhodes, Greece, October 24–28, 2016, Proceedings. Springer, 910–917.Google ScholarCross Ref
- [126] . 2017. BDLaaS: Big data lab as a service for experimenting big data solution. In IEEE 2nd International Workshops on Foundations and Applications of Self* Systems (FAS* W’17). 155–159.Google Scholar
- [127] . 2015. Big data pre-processing: A quality framework. (2015), 191–198.Google Scholar
- [128] . 2018. Big data quality assessment model for unstructured data. In International Conference on Innovations in Information Technology (IIT’18). 69–74.Google Scholar
- [129] . 2019. Big data quality: A data quality profiling model. In Services–SERVICES 2019: 15th World Congress, Held as Part of the Services Conference Federation, SCF 2019, San Diego, CA, USA, June 25–30, 2019, Proceedings 15. Springer, 61–77.Google ScholarDigital Library
- [130] . 2020. How to Manage Modern Data Quality [White Paper].
Technical Report . Talend. Retrieved from https://www.talend.com/resources/definitive-guide-data-quality-how-to-manage.Google Scholar - [131] . 2020. Towards a powerful solution for data accuracy assessment in the big data context. Int. J. Advanc. Comput. Sci. Applic. 11, 2 (2020).Google Scholar
- [132] . 2016. Ernest: Efficient performance prediction for large-scale advanced analytics. In 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI’16). 363–378.Google Scholar
- [133] . 2016. Machine learning in big data. Int. J. Math., Eng. Manag. Sci. 1, 2 (2016), 52–61.Google Scholar
- [134] . 1998. A product perspective on total data quality management. Commun. ACM 41, 2 (1998), 58–65.Google ScholarDigital Library
- [135] . 1996. Beyond accuracy: What data quality means to data consumers. J. Manag. Inf. Syst. 12 (1996), 5–33.Google ScholarDigital Library
- [136] . 2020. Evaluating the crowd quality for subjective questions based on a Spark computing environment. Fut. Gen. Comput. Syst. 106 (2020), 426–437.Google ScholarDigital Library
- [137] . 2009. Anchoring the consistency dimension of data quality using ontology in data integration. (2009), 201–205.Google Scholar
- [138] . 2014. A classification of data quality assessment and improvement methods. Int. J. Inf. Qual. 3, 4 (2014), 298–321.Google Scholar
- [139] . 2013. Sensing as a service and big data. arXiv preprint arXiv:1301.0159 (2013).Google Scholar
- [140] . 2013. User-driven quality evaluation of DBpedia. In 9th International Conference on Semantic Systems. 97–104.Google ScholarDigital Library
- [141] . 2017. A survey on quality assurance techniques for big data applications. (2017), 313–319.Google Scholar
- [142] . 2022. Split, embed and merge: An accurate table structure recognizer. Pattern Recognit. 126 (2022), 108565.Google ScholarDigital Library
- [143] . 2017. Machine learning on big data: Opportunities and challenges. Neurocomputing 237 (2017), 350–361.Google ScholarDigital Library
Index Terms
- Context-aware Big Data Quality Assessment: A Scoping Review
Recommendations
BIGQA: Declarative Big Data Quality Assessment
In the big data domain, data quality assessment operations are often complex and must be implementable in a distributed and timely manner. This article tries to generalize the quality assessment operations by providing a new ISO-based declarative data ...
A Data Quality in Use model for Big Data
Beyond the hype of Big Data, something within business intelligence projects is indeed changing. This is mainly because Big Data is not only about data, but also about a complete conceptual and technological stack including raw and processed data, ...
Context-aware data quality assessment for big data
AbstractBig data changed the way in which we collect and analyze data. In particular, the amount of available information is constantly growing and organizations rely more and more on data analysis in order to achieve their competitive ...
Highlights- Data Quality assessment is a key success point for applications using big data.
Comments