Data Multiverse: The Uncertainty Challenge of Future Big Data Analytics

Tudoran, Radu; Nicolae, Bogdan; Brasche, Götz

doi:10.1007/978-3-319-53640-8_2

Radu Tudoran¹⁶,
Bogdan Nicolae¹⁶ &
Götz Brasche¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10151))

Included in the following conference series:

International KEYSTONE Conference on Semantic Keyword-Based Search on Structured Data Sources

786 Accesses

Abstract

With the explosion of data sizes, extracting valuable insight out of big data becomes increasingly difficult. New challenges begin to emerge that complement traditional, long-standing challenges related to building scalable infrastructure and runtime systems that can deliver the desired level of performance and resource efficiency. This vision paper focuses on one such challenge, which we refer to as the analytics uncertainty: with so much data available from so many sources, it is difficult to anticipate what the data can be useful for, if at all. As a consequence, it is difficult to anticipate what data processing algorithms and methods are the most appropriate to extract value and insight. In this context, we contribute with a study on current big data analytics state-of-art, the use cases where the analytics uncertainty is emerging as a problem and future research directions to address them.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Big data analytics: a survey

Article Open access 01 October 2015

Big Data Analytics

Tools and Libraries for Big Data Analysis

References

Flink. https://flink.apache.org/
The Zettabyte Era: Trends and Analysis. Cisco Systems, White Paper 1465272001812119 (2016)
Google Scholar
Akidau, T., Balikov, A., Bekiroglu, K., Chernyak, S., Haberman, J., Lax, R., McVeety, S., Mills, D., Nordstrom, P., Whittle, S.: Millwheel: Fault-tolerant stream processing at internet scale. In: Very Large Data Bases, pp. 734–746 (2013)
Google Scholar
Akidau, T., Bradshaw, R., Chambers, C., Chernyak, S., Fernndez-Moctezuma, R.J., Lax, R., McVeety, S., Mills, D., Perry, F., Schmidt, E., Whittle, S.: The dataflow model: A practical approach to balancing correctness, latency, and cost in massive-scale, unbounded, out-of-order data processing. Proc. VLDB Endowment 8, 1792–1803 (2015)
Article Google Scholar
Cao, L., Wei, M., Yang, D., Rundensteiner, E.A.: Online outlier exploration over large datasets. In: 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2015, Sydney, Australia, pp. 89–98 (2015)
Google Scholar
Carbone, P., Traub, J., Katsifodimos, A., Haridi, S., Markl, V.: Cutty: Aggregate sharing for user-defined windows. In: 25th ACM International on Conference on Information and Knowledge Management, CIKM 2016, pp. 1201–1210 (2016)
Google Scholar
Dean, J., Ghemawat, S.: Mapreduce: Simplified data processing on large clusters. In: 6th Conference on Symposium on Opearting Systems Design and Implementation, OSDI 2004, pp. 10:1–10:13. USENIX Association, San Francisco (2004)
Google Scholar
Hammad, M.A., Aref, W.G., Elmagarmid, A.K.: Query processing of multi-way stream window joins. VLDB J. 17(3), 469–488 (2008)
Article Google Scholar
Neumeyer, L., Robbins, B., Kesari, A., Nair, A.: S4: Distributed stream computing platform. In: 10th IEEE International Conference on Data Mining Workshops, ICDMW 2010, Los Alamitos, USA, pp. 170–177 (2010)
Google Scholar
Nicolae, B., Costa, C., Misale, C., Katrinis, K., Park, Y.: Leveraging adaptive I/O to optimize collective data shuffling patterns for big data analytics. IEEE Trans. Parallel Distrib. Syst. (2017)
Google Scholar
Nicolae, B., Kochut, A., Karve, A.: Towards scalable on-demand collective data access in IaaS clouds: An adaptive collaborative content exchange proposal. J. Parallel Distrib. Comput. 87, 67–79 (2016)
Article Google Scholar
Hey, T., Tansley, S., Tolle, K.M.: The Fourth Paradigm: Data-Intensive Scientific Discovery. Microsoft Research, Redmond (2009)
Google Scholar
Toshniwal, A., et al.: Storm@twitter. In: 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD 2014, Snowbird, USA, pp. 147–156 (2014)
Google Scholar
Tudoran, R., Costan, A., Nano, O., Santos, I., Soncu, H., Antoniu, G.: Jetstream: Enabling high throughput live event streaming on multi-site clouds. Future Gener. Comput. Syst. 54, 274–291 (2016)
Article Google Scholar
Yang, D., Rundensteiner, E.A., Ward, M.O.: Shared execution strategy for neighbor-based pattern mining requests over streaming windows. ACM Trans. Database Syst. 37(1), 5:1–5:44 (2012)
Article Google Scholar
Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauly, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In: The 9th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2012, San Jose, USA (2012)
Google Scholar
Zaharia, M., Das, T., Li, H., Shenker, S., Stoica, I.: Discretized streams: An efficient and fault-tolerant model for stream processing on large clusters. In: 4th USENIX Conference on Hot Topics in Cloud Ccomputing, HotCloud 212 (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Huawei German Research Center, Riesstraße 25, 80992, München, Germany
Radu Tudoran, Bogdan Nicolae & Götz Brasche

Authors

Radu Tudoran
View author publications
You can also search for this author in PubMed Google Scholar
Bogdan Nicolae
View author publications
You can also search for this author in PubMed Google Scholar
Götz Brasche
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Radu Tudoran .

Editor information

Editors and Affiliations

Department of Computer Science and Information Systems, Birkbeck University of London, London, UK
Andrea Calì
Computer Science Department, Technical University of Cluj-Napoca, Cluj-Napoca, Romania
Dorian Gorgan
Computer and Decision Engineering (CoDE) Department, Université Libre de Bruxelles, Brussels, Belgium
Martín Ugarte

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tudoran, R., Nicolae, B., Brasche, G. (2017). Data Multiverse: The Uncertainty Challenge of Future Big Data Analytics. In: Calì, A., Gorgan, D., Ugarte, M. (eds) Semantic Keyword-Based Search on Structured Data Sources. IKC 2016. Lecture Notes in Computer Science(), vol 10151. Springer, Cham. https://doi.org/10.1007/978-3-319-53640-8_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-53640-8_2
Published: 15 February 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-53639-2
Online ISBN: 978-3-319-53640-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Data Multiverse: The Uncertainty Challenge of Future Big Data Analytics

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Big data analytics: a survey

Big Data Analytics

Tools and Libraries for Big Data Analysis

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Data Multiverse: The Uncertainty Challenge of Future Big Data Analytics

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Big data analytics: a survey

Big Data Analytics

Tools and Libraries for Big Data Analysis

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation