skip to main content
10.1145/3010089.3010090acmotherconferencesArticle/Chapter ViewAbstractPublication PagesbdawConference Proceedingsconference-collections
research-article

Defining Big Data

Published:10 November 2016Publication History

ABSTRACT

As Big Data becomes better understood, there is a need for a comprehensive definition of Big Data to support work in fields such as data quality for Big Data. Existing definitions of Big Data define Big Data by comparison with existing, usually relational, definitions, or define Big Data in terms of data characteristics or use an approach which combines data characteristics with the Big Data environment. In this paper we examine existing definitions of Big Data and discuss the strengths and limitations of the different approaches, with particular reference to issues related to data quality in Big Data. We identify the issues presented by incomplete or inconsistent definitions. We propose an alternative definition and relate this definition to our work on quality in Big Data.

References

  1. Wand, Y & Wang R.Y. (1996) Anchoring Data Quality Dimensions in Ontological Foundations Communications of the ACM 39, 86--95 Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Gupta, P., Tyagi, N., 2015. An approach towards big data; A review, 2015 International Conference on Computing, Communication Automation (ICCCA). Presented at the 2015 International Conference on Computing, Communication Automation (ICCCA), 118--123.Google ScholarGoogle ScholarCross RefCross Ref
  3. Suresh, J. (2014) Bird's Eye View on Big Data Management 2014 Conference on IT in Business, Industry and Government (CSIBIG) 1--5Google ScholarGoogle Scholar
  4. Khan, M, Uddin, M. & Gupta N. (2014) Seven V's of Big Data; Understanding Big Data to extract value. Proceedings of the 2014 Zone 1 Conference of the American Society for Engineering Education "Engineering Education: Industry Involvement and Interdisciplinary Trends" ASEE Zone 1 2014Google ScholarGoogle ScholarCross RefCross Ref
  5. Bedi, P., Jindal, V., & Gautam, A. (2014) Beginning with big data simplified. 2014 International Conference on Data Mining and Intelligence Computing (ICDMIC) 1--7,Google ScholarGoogle ScholarCross RefCross Ref
  6. Demchenko, Y., Grosso, P., De Laat, C., & Membrey, P. (2013). Addressing big data issues in Scientific Data Infrastructure. In 2013 International Conference on Collaboration Technologies and Systems (CTS) (pp. 48--55).Google ScholarGoogle ScholarCross RefCross Ref
  7. Demchenko, Y., Gruengard, E. & Klous, S., 2014. Instructional Model for Building Effective Big Data Curricula for Online and Campus Education. 2014 IEEE 6th International Conference on Cloud Computing Technology and Science, pp.935--941. Available at: http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=7037787 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Marr, B., 2014. Big Data: The 5 Vs Everyone Must Know. LinkedIn Pulse. https://www.linkedin.com/pulse/20140306073407-64875646-big-data-the-5-vs-everyone-must-knowGoogle ScholarGoogle Scholar
  9. Press, G., 2014. 12 Big Data Definitions: What's Yours? Forbes. http://www.forbes.com/sites/gilpress/2014/09/03/12-big-data-definitions-whats-yours/#611ecbcc21a9Google ScholarGoogle Scholar
  10. Hu, H., Wen, Y., Chua, T.-S., & Li, X. (2014). Toward Scalable Systems for Big Data Analytics: A Technology Tutorial. IEEE Access, 2, 652--687.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Durham, E.E., Rosen, A. & Harrison R.W. (2014). A model architecture for Big Data applications using relational databases 2014 International Conference on Big Data 9--16.Google ScholarGoogle Scholar
  12. Navathe, S.B., (1992). Evolution of Data Modeling for Databases. Communications ACM 35, 112--123. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Codd, E.F., 1970. A Relational Model of Data for Large Shared Data Banks. Communications. ACM, 13(6), 377--387 Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Angles, R., & Gutierrez, C. (2008). Survey of Graph Database Models. ACM Comput. Surv., 40(1), 1:1--1:39. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Gartner Research http://www.gartner.com/it-glossary/big-data/Google ScholarGoogle Scholar
  16. Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., Byers, A.H., (2011) Big data: The next frontier for innovation, competition, and productivity. McKinsey Global Institute http://www.mckinsey.com/business-functions/business-technology/our-insights/big-data-the-next-frontier-for-innovationGoogle ScholarGoogle Scholar
  17. Jacobs, A. (2009). The Pathologies of Big Data. Queue, 7(6), 10. (2009) Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Gantz, B. J., & Reinsel, D. (2011). Extracting Value from Chaos State of the Universe: An Executive Summary. IDC iView, (June), 1--12. Retrieved from http://idcdocserv.com/1142Gantz & Reinsel, 2011Google ScholarGoogle Scholar
  19. Chen, M., Mao, S. & Liu, Y.(2014) Big Data: A survey Mobile Networks and Applications, 19, 171--209.Google ScholarGoogle Scholar
  20. Gandomi, A. & Haider, M., 2015. Beyond the hype: Big data concepts, methods, and analytics. International Journal of Information Management, 35(2), pp.137--144. Available at: http://www.sciencedirect.com/science/article/pii/S0268401214001066. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. IBM Big Data & Analytics Hub http://www.ibmbigdatahubGoogle ScholarGoogle Scholar
  22. Saha, B., & Srivastava, D. (2014). Data quality: The other face of Big Data. Proceedings - International Conference on Data Engineering, 1294--1297.Google ScholarGoogle ScholarCross RefCross Ref
  23. Datafloq https://datafloq.com/read/3vs-sufficient-describe-big-data/166Google ScholarGoogle Scholar
  24. Sagiroglu, S. & Sinanc, D., 2013. Big data: A review. 2013 International Conference on Collaboration Technologies and Systems (CTS). pp. 42--47Google ScholarGoogle ScholarCross RefCross Ref
  25. Xhafa, F., Naranjo, V., Barolli, L., & Takizawa, M. (2015). On Streaming Consistency of Big Data Stream Processing in Heterogenous Clutsers. 2015 18th International Conference on Network-Based Information Systems, 476--482. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Batini, C., & Scannapieco, M. (2006). Data Quality: Concepts, Methodologies and Techniques (Data-Centric Systems and Applications). Secaucus, NJ, USA: Springer-Verlag New York, Inc. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Juddoo, S. (2015). Overview of data quality challenges in the context of Big Data. In 2015 International Conference on Computing, Communication and Security (ICCCS) (pp. 1--9).Google ScholarGoogle ScholarCross RefCross Ref
  28. Cai, L., & Zhu, Y. (2015). The Challenges of Data Quality and Data Quality Assessment in the Big Data Era. Data Science Journal, 14(0), 2.Google ScholarGoogle Scholar
  29. Strong, D.M., Lee, Y.W., Wang, R.Y., (1997). Data Quality in Context. Communications ACM 40, 103--110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Cooper, M., Mell, P. (2012) Tackling Big Data NIST Computer Security Resource Centre (fcsm_june2012_cooper_mell).Google ScholarGoogle Scholar
  31. NIST Big Data Public Working Group, & Subgroup, T. (2015). NIST Special Publication XXX-XXX DRAFT NIST Big Data Interoperability Framework: Volume 1, Definitions DRAFT NIST Big Data Interoperability Framework: Volume 1, Definitions, 1.Google ScholarGoogle Scholar
  32. Cetintemel, U. et al., 2014. S-Store: a streaming NewSQL system for big velocity applications. Proceedings of the VLDB Endowment, 7(13), pp.1633--1636. Available at: http://dl.acm.org/citation.cfm?doid=2733004.2733048. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Moreno, A., & Redondo, T. (2016). Text Analytics: the convergence of Big Data and Artificial Intelligence. International Journal of Interactive Multimedia and Artificial Intelligence, 3(6), 57.Google ScholarGoogle Scholar
  34. Weets, J., Kakhani, M. K., & Kumar, A. (2015). Limitations and Challenges of HDFS and MapReduce. In 2015 International Conference on Green Computing and Internet of Things (ICGCIoT), 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Sakr, S., Liu, A., & Fayoumi, A. G. (2013). The Family of MapReduce and Large-Scale Data Processing Systems. ACM Computing Surveys, 46(1), 1--44. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    BDAW '16: Proceedings of the International Conference on Big Data and Advanced Wireless Technologies
    November 2016
    398 pages
    ISBN:9781450347792
    DOI:10.1145/3010089

    Copyright © 2016 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 10 November 2016

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader