Big Data Programming Models

Wu, Dongyao; Sakr, Sherif; Zhu, Liming

doi:10.1007/978-3-319-49340-4_2

Big Data Programming Models

Dongyao Wu^3,4,
Sherif Sakr^3,4,5 &
Liming Zhu^3,4

Chapter
First Online: 26 February 2017

7699 Accesses
9 Citations

Abstract

Big Data programming models represent the style of programming and present the interfaces paradigm for developers to write big data applications and programs. Programming models normally the core feature of big data frameworks as they implicitly affects the execution model of big data processing engines and also drives the way for users to express and construct the big data applications and programs. In this chapter, we comprehensively investigate different programming models for big data frameworks with comparison and concrete code examples.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 349.00; Price excludes VAT (USA)

Softcover Book: USD 449.99; Price excludes VAT (USA)

Hardcover Book: USD 449.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

A. Alexandrov, R. Bergmann, S. Ewen, J.-C. Freytag, F. Hueske, A. Heise, O. Kao, M. Leich, U. Leser, V. Markl, F. Naumann, M. Peters, A. Rheinländer, M.J. Sax, S. Schelter, M. Höger, K. Tzoumas, D. Warneke, The stratosphere platform for big data analytics, VLDB J. 23(6) (2014)
Google Scholar
Apache. Apache crunch (2016). https://crunch.apache.org/. Accessed 17 Mar 2016
Apache. Apache drill (2016). https://drill.apache.org/. Accessed 17 Mar 2016
Apache. Apache giraph (2016). https://giraph.apache.org/. Accessed 17 Mar 2016
Apache. Apache hama (2016). https://hama.apache.org/. Accessed 17 Mar 2016
Apache. Apache orc (2016). https://orc.apache.org/. Accessed 17 Mar 2016
Apache. Avro (2016). https://avro.apache.org/. Accessed 17 Mar 2016
Apache. Hadoop (2016). http://hadoop.apache.org/. Accessed 17 Mar 2016
Apache. Mahout: Scalable machine learning and data mining (2016). https://mahout.apache.org/. Accessed 17 Mar 2016
Apache. Parquet (2016). https://parquet.apache.org/. Accessed 17 Mar 2016
Apache. Spark r (2016). https://spark.apache.org/docs/1.6.0/sparkr.html. Accessed 17 Mar 2016
Apache Storm. Trident (2016). http://storm.apache.org/documentation/Trident-tutorial.html. Accessed 17 Mar 2016
M. Armbrust, R.S. Xin, C. Lian, Y. Huai, D. Liu, J.K. Bradley, X. Meng, T. Kaftan, M.J. Franklin, A. Ghodsi, M. Zaharia, Spark SQL: relational data processing in spark, in SIGMOD (2015), pp. 1383–1394
Google Scholar
AsterixDB. Asterix query language (aql) (2016). https://asterixdb.ics.uci.edu/documentation/aql/manual.html. Accessed 17 Mar 2016
Azure Microsoft. Microsoft azure: Cloud computing platform and services (2016). https://azure.microsoft.com. Accessed 27 Feb 2016
O. Batarfi, R. El Shawi, A.G. Fayoumi, R. Nouri, S.-M.-R. Beheshti, A. Barnawi, S. Sakr, Large scale graph processing systems: survey and an experimental evaluation. Clust. Comput. 18(3), 1189–1213 (2015)
Article Google Scholar
R.A. Becker, J.M. Chambers, S: An Interactive Environment for Data Analysis and Graphics (CRC Press, New York, 1984)
Google Scholar
K.S. Beyer, V. Ercegovac, R. Gemulla, A. Balmin, M. Eltabakh, C.-C. Kanne, F. Ozcan, E.J. Shekita, Jaql: a scripting language for large scale semistructured data analysis, in Proceedings of VLDB Conference (2011)
Google Scholar
C. Chambers, A. Raniwala, F. Perry, S. Adams, R.R. Henry, R. Bradshaw, N. Weizenbaum, FlumeJava: easy, efficient data-parallel pipelines, in PLDI (2010)
Google Scholar
W. Clinger, J. Rees, Ieee standard for the scheme programming language, in Institute for Electrical and Electronic Engineers (1991), pp. 1178–1990
Google Scholar
Cloudera. Apache impala (2016). http://impala.io/. Accessed 17 Mar 2016
T.H. Cormen, Introduction to Algorithms (MIT press, New York, 2009)
MATH Google Scholar
S. Das, Y. Sismanis, K.S. Beyer, R. Gemulla, P.J. Haas, J. McPherson, Ricardo: integrating r and hadoop, in Proceedings of the 2010 ACM SIGMOD International Conference on Management of data (ACM, 2010), pp. 987–998
Google Scholar
J. Dean, S. Ghemawat, MapReduce: simplified data processing on large clusters. Commun. ACM 51(1) (2008)
Google Scholar
Facebook. Presto (2016), https://prestodb.io/. Accessed 17 Mar 2016
L. George, HBase: The Definitive Guide (O’Reilly Media, Inc., 2011)
Google Scholar
Google. Cloud sql - mysql relational database (2016). https://cloud.google.com/sql/. Accessed 27 Feb 2016
S. Guha, R. Hafen, J. Rounds, J. Xia, J. Li, B. Xi, W.S. Cleveland, Large complex data: divide and recombine (d&r) with rhipe. Stat 1(1), 53–67 (2012)
Google Scholar
C. Hewitt, P. Bishop, R. Steiger, A universal modular actor formalism for artificial intelligence, in Proceedings of the 3rd International Joint Conference on Artificial Intelligence (Morgan Kaufmann Publishers Inc., 1973), pp. 235–245
Google Scholar
S. Hong, H. Chafi, E. Sedlar, K. Olukotun, Green-marl: a dsl for easy and efficient graph analysis, in ACM SIGARCH Computer Architecture News, vol. 40 (ACM, 2012), pp. 349–362
Google Scholar
Inc Concurrent. Cascading - application platform for enterprise big data (2016). http://www.cascading.org/. Accessed 17 Mar 2016
R. Ihaka, R. Gentleman, R: a language for data analysis and graphics. J. Comput. Graph. Stat. 5(3), 299–314 (1996)
Google Scholar
M. Isard, M. Budiu, Y. Yu, A. Birrell, D. Fetterly, Dryad: distributed data-parallel programs from sequential building blocks, in ACM SIGOPS Operating Systems Review, vol. 41 (ACM, 2007), pp. 59–72
Google Scholar
M. Islam, A.K. Huang, M. Battisha, M. Chiang, S. Srinivasan, C. Peters, A. Neumann, A. Abdelnur, Oozie: towards a scalable workflow management system for hadoop, in SIGMOD Workshops (2012)
Google Scholar
W.M. Johnston, J.R. Hanna, R.J. Millar, Advances in dataflow programming languages. ACM Comput. Surv. (CSUR) 36(1), 1–34 (2004)
Google Scholar
A. Lakshman, P. Malik, Cassandra: a decentralized structured storage system. ACM SIGOPS Oper. Syst. Rev. 44(2), 35–40 (2010)
Article Google Scholar
G. Malewicz, M.H. Austern, A.J.C. Bik, J.C. Dehnert, I. Horn, N. Leiser, G. Czajkowski, Pregel: a system for large-scale graph processing, in SIGMOD Conference (2010)
Google Scholar
X. Meng, J. Bradley, B. Yavuz, E. Sparks, S. Venkataraman, D. Liu, J. Freeman, D.B. Tsai, M. Amde, S. Owen, et al., Mllib: machine learning in apache spark (2015). arXiv preprint, arXiv:1505.06807
MongoDB Inc. Mongodb for giant ideas (2016). https://www.mongodb.org/. Accessed 27 Feb 2016
C. Olston, B. Reed, U. Srivastava, R. Kumar, A. Tomkins, Pig latin: a not-so-foreign language for data processing, in SIGMOD (2008)
Google Scholar
Swift OpenStack. Openstack swift - enterprise storage from swiftstack (2016). https://www.swiftstack.com/openstack-swift/. Accessed 27 Feb 2016
S. Sakr, Big Data 2.0 Processing Systems (Springer, Berlin, 2016)
Google Scholar
S. Sakr, M.M. Gaber (eds.) Large Scale and Big Data - Processing and Management (Auerbach Publications, 2014)
Google Scholar
Sherif Sakr, Anna Liu, Ayman G. Fayoumi, The family of mapreduce and large-scale data processing systems. ACM Comput. Surv. 46(1), 11 (2013)
Article Google Scholar
K. Shvachko, H. Kuang, S. Radia, R. Chansler, The hadoop distributed file system, in IEEE MSST (2010)
Google Scholar
S3 Amazon. Amazon simple storage service (amazon s3) (2016). https://aws.amazon.com/s3/. Accessed 27 Feb 2016
A. Thusoo, J.S. Sarma, N. Jain, Z. Shao, P. Chakka, S. Anthony, H. Liu, P. Wyckoff, R. Murthy, Hive: a warehousing solution over a map-reduce framework. Proc. VLDB Endow. 2(2), 1626–1629 (2009)
Google Scholar
A. Toshniwal, S. Taneja, A. Shukla, K. Ramasamy, J.M. Patel, S. Kulkarni, J. Jackson, K. Gade, M. Fu, J. Donham, et al., Storm@ twitter, in Proceedings of the 2014 ACM SIGMOD international conference on Management of data (ACM, 2014), pp. 147–156
Google Scholar
Typesafe. Akka (2016). http://akka.io/. Accessed 17 Mar 2016
Typesafe. Play framework - build modern & scalable web apps with java and scala (2016). https://www.playframework.com/. Accessed 17 Mar 2016
L.G. Valiant, A bridging model for parallel computation. Commun. ACM 33(8), 103–111 (1990)
Article Google Scholar
Y. Yu, M. Isard, D. Fetterly, M. Budiu, Ú. Erlingsson, P.K. Gunda, J. Currey, Dryadlinq: a system for general-purpose distributed data-parallel computing using a high-level language, in OSDI, vol. 8 (2008), pp. 1–14
Google Scholar
M. Zaharia, M. Chowdhury, M.J. Franklin, S. Shenker, I. Stoica, Spark: cluster computing with working sets, in HotCloud (2010)
Google Scholar
M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauly, M.J. Franklin, S. Shenker, I. Stoica, Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing, in NSDI (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Data61, CSIRO, Sydney, NSW, Australia
Dongyao Wu, Sherif Sakr & Liming Zhu
School of Computer Science and Engineering, University of New South Wales, Sydney, NSW, Australia
Dongyao Wu, Sherif Sakr & Liming Zhu
King Saud Bin Abdulaziz University for Health Sciences, National Guard, Riyadh, Saudi Arabia
Sherif Sakr

Authors

Dongyao Wu
View author publications
You can also search for this author in PubMed Google Scholar
Sherif Sakr
View author publications
You can also search for this author in PubMed Google Scholar
Liming Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dongyao Wu .

Editor information

Editors and Affiliations

School of Information Technologies, The University of Sydney, Sydney, New South Wales, Australia
Albert Y. Zomaya
The School of Computer Science, The University of New South Wales, Eveleigh, New South Wales, Australia
Sherif Sakr

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Wu, D., Sakr, S., Zhu, L. (2017). Big Data Programming Models. In: Zomaya, A., Sakr, S. (eds) Handbook of Big Data Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-49340-4_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-49340-4_2
Published: 26 February 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49339-8
Online ISBN: 978-3-319-49340-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics