Big Data Storage and Processing on Azure Clouds: Experiments at Scale and Lessons Learned

Tudoran, Radu; Costan, Alexandru; Antoniu, Gabriel; Goetz, Brasche

doi:10.1007/978-1-4939-1905-5_14

Radu Tudoran³,
Alexandru Costan³,
Gabriel Antoniu³ &
…
Brasche Goetz⁴

1541 Accesses
2 Citations

Abstract

Data-intensive computing is now starting to be considered as the basis for a new, fourth paradigm for science. Two factors are encouraging this trend. First, vast amounts of data are becoming available in more and more application areas. Second, the infrastructures allowing to persistently store these data for sharing and processing are becoming a reality. This allows to unify knowledge acquired through the previous three paradigms for scientific research (theory, experiments and simulations) with vast amounts of multidisciplinary data. The technical and scientific issues related to this context have been designated as the “Big Data” challenges. In this landscape, building a functional infrastructure for the requirements of Big Data applications is critical and is still a challenge. An important step has been made thanks to the emergence of cloud infrastructures, which are bringing the first bricks to cope with the challenging scale of the Big Data vision. Clouds bring to life the illusion of a (more-or-less) infinitely scalable infrastructure managed through a fully outsourced ICT service. Instead of having to buy and manage hardware, users “rent” outsourced resources as needed. However, cloud technologies have not reached yet their full potential. In particular, the capabilities available now for data storage and processing are still far from meeting the application requirements. In this work we investigate several hot challenges related to Big Data management on clouds. We discuss current state-of-the-art solutions, their limitations and some ways to overcome them. We illustrate our study with a concrete application study from the area of joint genetic and neuroimaging data analysis. The goal of this chapter is to present the conclusions of this study performed through a large-scale experiment carried out across three data centers of Microsoft’s Azure cloud platform during 2 weeks, which consumed approximately 200.000 compute hours.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 179.00; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

A-Brain. http://www.irisa.fr/kerdata/doku.php?id=abrain.
Azure. http://www.windowsazure.com/.
Extracting Value from Chaos. EMC Corporation, June 2011. http://www.emc.com/collateral/analyst-reports/idc-extracting-value-from-chaos-ar.pdf.
B. Calder, J. Wang, A. Ogus, N. Nilakantan, A. Skjolsvold, S. McKelvie, Y. Xu, S. Srivastav, J. Wu, H. Simitci, J. Haridas, C. Uddaraju, H. Khatri, A. Edwards, V. Bedekar, S. Mainali, R. Abbasi, A. Agarwal, M. F. u. Haq, M. I. u. Haq, D. Bhardwaj, S. Dayanand, A. Adusumilli, M. McNett, S. Sankaran, K. Manivannan, and L. Rigas. Windows azure storage: a highly available cloud storage service with strong consistency. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles, SOSP ‘11, pages 143–157, New York, NY, USA, 2011. ACM.
Google Scholar
D. Chappell. Introducing the Windows Azure Platform. Technical report, Microsoft. http://www.microsoft.com/windowsazure/whitepapers/.
A. Costan, R. Tudoran, G. Antoniu, and G. Brasche. TomusBlobs: Scalable Data-intensive Processing on Azure Clouds. Journal of Concurrency and computation: practice and experience, 2013.
Google Scholar
A. Greenberg, J. Hamilton, D. A. Maltz, and P. Patel. The cost of a cloud: research problems in data center networks. SIGCOMM Comput. Commun. Rev., 39(1):68–73, Dec. 2008.
Article Google Scholar
K. Keahey, M. Tsugawa, A. Matsunaga, and J. Fortes. Sky computing. IEEE Internet Computing, 13(5):43–51, Sept. 2009.
Article Google Scholar
B. Nicolae, G. Antoniu, L. Bougé, D. Moise, and A. Carpen-Amarie. BlobSeer: Next Generation Data Management for Large Scale Infrastructures. Journal of Parallel and Distributed Computing, 71(2):168–184, Feb. 2011.
Article Google Scholar
R. Tudoran, A. Costan, and G. Antoniu. Mapiterativereduce: a framework for reduction-intensive data processing on azure clouds. In Proceedings of third international workshop on MapReduce and its Applications Date, MapReduce ‘12, pages 9–16, New York, NY, USA, 2012. ACM.
Google Scholar
R. Tudoran, A. Costan, and G. Antoniu. Datasteward: Using dedicated compute nodes for scalable data management on public clouds. In Proceedings of the 11th IEEE International Symposium on Parallel and Distributed Processing with Applications, ISPA ‘13, Washington, DC, USA, 2013. IEEE Computer Society.
Google Scholar
R. Tudoran, A. Costan, G. Antoniu, and H. Soncu. Tomusblobs: Towards communication-efficient storage for mapreduce applications in azure. In Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012), CCGRID ‘12, pages 427–434, Washington, DC, USA, 2012. IEEE Computer Society.
Google Scholar
E. Yildirim and T. Kosar. Network-aware end-to-end data throughput optimization. In Proceedings of the first international workshop on Network-aware data management, NDM ‘11, pages 21–30, New York, NY, USA, 2011. ACM.
Google Scholar

Download references

Author information

Authors and Affiliations

INRIA Rennes, Campus de Beaulieu, 35042, Rennes, France
Radu Tudoran, Alexandru Costan & Gabriel Antoniu
Huawei Technologies, Duesseldorf GmbH, Düsseldorf, Germany, WA, USA
Brasche Goetz

Authors

Radu Tudoran
View author publications
You can also search for this author in PubMed Google Scholar
Alexandru Costan
View author publications
You can also search for this author in PubMed Google Scholar
Gabriel Antoniu
View author publications
You can also search for this author in PubMed Google Scholar
Brasche Goetz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alexandru Costan .

Editor information

Editors and Affiliations

University of Florida, Gainesville, Florida, USA
Xiaolin Li
Indiana University, Bloomington, Indiana, USA
Judy Qiu

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Tudoran, R., Costan, A., Antoniu, G., Goetz, B. (2014). Big Data Storage and Processing on Azure Clouds: Experiments at Scale and Lessons Learned. In: Li, X., Qiu, J. (eds) Cloud Computing for Data-Intensive Applications. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-1905-5_14

Download citation

DOI: https://doi.org/10.1007/978-1-4939-1905-5_14
Published: 15 November 2014
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4939-1904-8
Online ISBN: 978-1-4939-1905-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics