Abstract
Big Data programming models represent the style of programming and present the interfaces paradigm for developers to write big data applications and programs. Programming models normally the core feature of big data frameworks as they implicitly affects the execution model of big data processing engines and also drives the way for users to express and construct the big data applications and programs. In this chapter, we comprehensively investigate different programming models for big data frameworks with comparison and concrete code examples.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
A. Alexandrov, R. Bergmann, S. Ewen, J.-C. Freytag, F. Hueske, A. Heise, O. Kao, M. Leich, U. Leser, V. Markl, F. Naumann, M. Peters, A. Rheinländer, M.J. Sax, S. Schelter, M. Höger, K. Tzoumas, D. Warneke, The stratosphere platform for big data analytics, VLDB J. 23(6) (2014)
Apache. Apache crunch (2016). https://crunch.apache.org/. Accessed 17 Mar 2016
Apache. Apache drill (2016). https://drill.apache.org/. Accessed 17 Mar 2016
Apache. Apache giraph (2016). https://giraph.apache.org/. Accessed 17 Mar 2016
Apache. Apache hama (2016). https://hama.apache.org/. Accessed 17 Mar 2016
Apache. Apache orc (2016). https://orc.apache.org/. Accessed 17 Mar 2016
Apache. Avro (2016). https://avro.apache.org/. Accessed 17 Mar 2016
Apache. Hadoop (2016). http://hadoop.apache.org/. Accessed 17 Mar 2016
Apache. Mahout: Scalable machine learning and data mining (2016). https://mahout.apache.org/. Accessed 17 Mar 2016
Apache. Parquet (2016). https://parquet.apache.org/. Accessed 17 Mar 2016
Apache. Spark r (2016). https://spark.apache.org/docs/1.6.0/sparkr.html. Accessed 17 Mar 2016
Apache Storm. Trident (2016). http://storm.apache.org/documentation/Trident-tutorial.html. Accessed 17 Mar 2016
M. Armbrust, R.S. Xin, C. Lian, Y. Huai, D. Liu, J.K. Bradley, X. Meng, T. Kaftan, M.J. Franklin, A. Ghodsi, M. Zaharia, Spark SQL: relational data processing in spark, in SIGMOD (2015), pp. 1383–1394
AsterixDB. Asterix query language (aql) (2016). https://asterixdb.ics.uci.edu/documentation/aql/manual.html. Accessed 17 Mar 2016
Azure Microsoft. Microsoft azure: Cloud computing platform and services (2016). https://azure.microsoft.com. Accessed 27 Feb 2016
O. Batarfi, R. El Shawi, A.G. Fayoumi, R. Nouri, S.-M.-R. Beheshti, A. Barnawi, S. Sakr, Large scale graph processing systems: survey and an experimental evaluation. Clust. Comput. 18(3), 1189–1213 (2015)
R.A. Becker, J.M. Chambers, S: An Interactive Environment for Data Analysis and Graphics (CRC Press, New York, 1984)
K.S. Beyer, V. Ercegovac, R. Gemulla, A. Balmin, M. Eltabakh, C.-C. Kanne, F. Ozcan, E.J. Shekita, Jaql: a scripting language for large scale semistructured data analysis, in Proceedings of VLDB Conference (2011)
C. Chambers, A. Raniwala, F. Perry, S. Adams, R.R. Henry, R. Bradshaw, N. Weizenbaum, FlumeJava: easy, efficient data-parallel pipelines, in PLDI (2010)
W. Clinger, J. Rees, Ieee standard for the scheme programming language, in Institute for Electrical and Electronic Engineers (1991), pp. 1178–1990
Cloudera. Apache impala (2016). http://impala.io/. Accessed 17 Mar 2016
T.H. Cormen, Introduction to Algorithms (MIT press, New York, 2009)
S. Das, Y. Sismanis, K.S. Beyer, R. Gemulla, P.J. Haas, J. McPherson, Ricardo: integrating r and hadoop, in Proceedings of the 2010 ACM SIGMOD International Conference on Management of data (ACM, 2010), pp. 987–998
J. Dean, S. Ghemawat, MapReduce: simplified data processing on large clusters. Commun. ACM 51(1) (2008)
Facebook. Presto (2016), https://prestodb.io/. Accessed 17 Mar 2016
L. George, HBase: The Definitive Guide (O’Reilly Media, Inc., 2011)
Google. Cloud sql - mysql relational database (2016). https://cloud.google.com/sql/. Accessed 27 Feb 2016
S. Guha, R. Hafen, J. Rounds, J. Xia, J. Li, B. Xi, W.S. Cleveland, Large complex data: divide and recombine (d&r) with rhipe. Stat 1(1), 53–67 (2012)
C. Hewitt, P. Bishop, R. Steiger, A universal modular actor formalism for artificial intelligence, in Proceedings of the 3rd International Joint Conference on Artificial Intelligence (Morgan Kaufmann Publishers Inc., 1973), pp. 235–245
S. Hong, H. Chafi, E. Sedlar, K. Olukotun, Green-marl: a dsl for easy and efficient graph analysis, in ACM SIGARCH Computer Architecture News, vol. 40 (ACM, 2012), pp. 349–362
Inc Concurrent. Cascading - application platform for enterprise big data (2016). http://www.cascading.org/. Accessed 17 Mar 2016
R. Ihaka, R. Gentleman, R: a language for data analysis and graphics. J. Comput. Graph. Stat. 5(3), 299–314 (1996)
M. Isard, M. Budiu, Y. Yu, A. Birrell, D. Fetterly, Dryad: distributed data-parallel programs from sequential building blocks, in ACM SIGOPS Operating Systems Review, vol. 41 (ACM, 2007), pp. 59–72
M. Islam, A.K. Huang, M. Battisha, M. Chiang, S. Srinivasan, C. Peters, A. Neumann, A. Abdelnur, Oozie: towards a scalable workflow management system for hadoop, in SIGMOD Workshops (2012)
W.M. Johnston, J.R. Hanna, R.J. Millar, Advances in dataflow programming languages. ACM Comput. Surv. (CSUR) 36(1), 1–34 (2004)
A. Lakshman, P. Malik, Cassandra: a decentralized structured storage system. ACM SIGOPS Oper. Syst. Rev. 44(2), 35–40 (2010)
G. Malewicz, M.H. Austern, A.J.C. Bik, J.C. Dehnert, I. Horn, N. Leiser, G. Czajkowski, Pregel: a system for large-scale graph processing, in SIGMOD Conference (2010)
X. Meng, J. Bradley, B. Yavuz, E. Sparks, S. Venkataraman, D. Liu, J. Freeman, D.B. Tsai, M. Amde, S. Owen, et al., Mllib: machine learning in apache spark (2015). arXiv preprint, arXiv:1505.06807
MongoDB Inc. Mongodb for giant ideas (2016). https://www.mongodb.org/. Accessed 27 Feb 2016
C. Olston, B. Reed, U. Srivastava, R. Kumar, A. Tomkins, Pig latin: a not-so-foreign language for data processing, in SIGMOD (2008)
Swift OpenStack. Openstack swift - enterprise storage from swiftstack (2016). https://www.swiftstack.com/openstack-swift/. Accessed 27 Feb 2016
S. Sakr, Big Data 2.0 Processing Systems (Springer, Berlin, 2016)
S. Sakr, M.M. Gaber (eds.) Large Scale and Big Data - Processing and Management (Auerbach Publications, 2014)
Sherif Sakr, Anna Liu, Ayman G. Fayoumi, The family of mapreduce and large-scale data processing systems. ACM Comput. Surv. 46(1), 11 (2013)
K. Shvachko, H. Kuang, S. Radia, R. Chansler, The hadoop distributed file system, in IEEE MSST (2010)
S3 Amazon. Amazon simple storage service (amazon s3) (2016). https://aws.amazon.com/s3/. Accessed 27 Feb 2016
A. Thusoo, J.S. Sarma, N. Jain, Z. Shao, P. Chakka, S. Anthony, H. Liu, P. Wyckoff, R. Murthy, Hive: a warehousing solution over a map-reduce framework. Proc. VLDB Endow. 2(2), 1626–1629 (2009)
A. Toshniwal, S. Taneja, A. Shukla, K. Ramasamy, J.M. Patel, S. Kulkarni, J. Jackson, K. Gade, M. Fu, J. Donham, et al., Storm@ twitter, in Proceedings of the 2014 ACM SIGMOD international conference on Management of data (ACM, 2014), pp. 147–156
Typesafe. Akka (2016). http://akka.io/. Accessed 17 Mar 2016
Typesafe. Play framework - build modern & scalable web apps with java and scala (2016). https://www.playframework.com/. Accessed 17 Mar 2016
L.G. Valiant, A bridging model for parallel computation. Commun. ACM 33(8), 103–111 (1990)
Y. Yu, M. Isard, D. Fetterly, M. Budiu, Ú. Erlingsson, P.K. Gunda, J. Currey, Dryadlinq: a system for general-purpose distributed data-parallel computing using a high-level language, in OSDI, vol. 8 (2008), pp. 1–14
M. Zaharia, M. Chowdhury, M.J. Franklin, S. Shenker, I. Stoica, Spark: cluster computing with working sets, in HotCloud (2010)
M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauly, M.J. Franklin, S. Shenker, I. Stoica, Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing, in NSDI (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
Wu, D., Sakr, S., Zhu, L. (2017). Big Data Programming Models. In: Zomaya, A., Sakr, S. (eds) Handbook of Big Data Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-49340-4_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-49340-4_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49339-8
Online ISBN: 978-3-319-49340-4
eBook Packages: Computer ScienceComputer Science (R0)