Skip to main content

Big Data Programming Models

  • Chapter
  • First Online:

Abstract

Big Data programming models represent the style of programming and present the interfaces paradigm for developers to write big data applications and programs. Programming models normally the core feature of big data frameworks as they implicitly affects the execution model of big data processing engines and also drives the way for users to express and construct the big data applications and programs. In this chapter, we comprehensively investigate different programming models for big data frameworks with comparison and concrete code examples.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   349.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   449.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   449.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. A. Alexandrov, R. Bergmann, S. Ewen, J.-C. Freytag, F. Hueske, A. Heise, O. Kao, M. Leich, U. Leser, V. Markl, F. Naumann, M. Peters, A. Rheinländer, M.J. Sax, S. Schelter, M. Höger, K. Tzoumas, D. Warneke, The stratosphere platform for big data analytics, VLDB J. 23(6) (2014)

    Google Scholar 

  2. Apache. Apache crunch (2016). https://crunch.apache.org/. Accessed 17 Mar 2016

  3. Apache. Apache drill (2016). https://drill.apache.org/. Accessed 17 Mar 2016

  4. Apache. Apache giraph (2016). https://giraph.apache.org/. Accessed 17 Mar 2016

  5. Apache. Apache hama (2016). https://hama.apache.org/. Accessed 17 Mar 2016

  6. Apache. Apache orc (2016). https://orc.apache.org/. Accessed 17 Mar 2016

  7. Apache. Avro (2016). https://avro.apache.org/. Accessed 17 Mar 2016

  8. Apache. Hadoop (2016). http://hadoop.apache.org/. Accessed 17 Mar 2016

  9. Apache. Mahout: Scalable machine learning and data mining (2016). https://mahout.apache.org/. Accessed 17 Mar 2016

  10. Apache. Parquet (2016). https://parquet.apache.org/. Accessed 17 Mar 2016

  11. Apache. Spark r (2016). https://spark.apache.org/docs/1.6.0/sparkr.html. Accessed 17 Mar 2016

  12. Apache Storm. Trident (2016). http://storm.apache.org/documentation/Trident-tutorial.html. Accessed 17 Mar 2016

  13. M. Armbrust, R.S. Xin, C. Lian, Y. Huai, D. Liu, J.K. Bradley, X. Meng, T. Kaftan, M.J. Franklin, A. Ghodsi, M. Zaharia, Spark SQL: relational data processing in spark, in SIGMOD (2015), pp. 1383–1394

    Google Scholar 

  14. AsterixDB. Asterix query language (aql) (2016). https://asterixdb.ics.uci.edu/documentation/aql/manual.html. Accessed 17 Mar 2016

  15. Azure Microsoft. Microsoft azure: Cloud computing platform and services (2016). https://azure.microsoft.com. Accessed 27 Feb 2016

  16. O. Batarfi, R. El Shawi, A.G. Fayoumi, R. Nouri, S.-M.-R. Beheshti, A. Barnawi, S. Sakr, Large scale graph processing systems: survey and an experimental evaluation. Clust. Comput. 18(3), 1189–1213 (2015)

    Article  Google Scholar 

  17. R.A. Becker, J.M. Chambers, S: An Interactive Environment for Data Analysis and Graphics (CRC Press, New York, 1984)

    Google Scholar 

  18. K.S. Beyer, V. Ercegovac, R. Gemulla, A. Balmin, M. Eltabakh, C.-C. Kanne, F. Ozcan, E.J. Shekita, Jaql: a scripting language for large scale semistructured data analysis, in Proceedings of VLDB Conference (2011)

    Google Scholar 

  19. C. Chambers, A. Raniwala, F. Perry, S. Adams, R.R. Henry, R. Bradshaw, N. Weizenbaum, FlumeJava: easy, efficient data-parallel pipelines, in PLDI (2010)

    Google Scholar 

  20. W. Clinger, J. Rees, Ieee standard for the scheme programming language, in Institute for Electrical and Electronic Engineers (1991), pp. 1178–1990

    Google Scholar 

  21. Cloudera. Apache impala (2016). http://impala.io/. Accessed 17 Mar 2016

  22. T.H. Cormen, Introduction to Algorithms (MIT press, New York, 2009)

    MATH  Google Scholar 

  23. S. Das, Y. Sismanis, K.S. Beyer, R. Gemulla, P.J. Haas, J. McPherson, Ricardo: integrating r and hadoop, in Proceedings of the 2010 ACM SIGMOD International Conference on Management of data (ACM, 2010), pp. 987–998

    Google Scholar 

  24. J. Dean, S. Ghemawat, MapReduce: simplified data processing on large clusters. Commun. ACM 51(1) (2008)

    Google Scholar 

  25. Facebook. Presto (2016), https://prestodb.io/. Accessed 17 Mar 2016

  26. L. George, HBase: The Definitive Guide (O’Reilly Media, Inc., 2011)

    Google Scholar 

  27. Google. Cloud sql - mysql relational database (2016). https://cloud.google.com/sql/. Accessed 27 Feb 2016

  28. S. Guha, R. Hafen, J. Rounds, J. Xia, J. Li, B. Xi, W.S. Cleveland, Large complex data: divide and recombine (d&r) with rhipe. Stat 1(1), 53–67 (2012)

    Google Scholar 

  29. C. Hewitt, P. Bishop, R. Steiger, A universal modular actor formalism for artificial intelligence, in Proceedings of the 3rd International Joint Conference on Artificial Intelligence (Morgan Kaufmann Publishers Inc., 1973), pp. 235–245

    Google Scholar 

  30. S. Hong, H. Chafi, E. Sedlar, K. Olukotun, Green-marl: a dsl for easy and efficient graph analysis, in ACM SIGARCH Computer Architecture News, vol. 40 (ACM, 2012), pp. 349–362

    Google Scholar 

  31. Inc Concurrent. Cascading - application platform for enterprise big data (2016). http://www.cascading.org/. Accessed 17 Mar 2016

  32. R. Ihaka, R. Gentleman, R: a language for data analysis and graphics. J. Comput. Graph. Stat. 5(3), 299–314 (1996)

    Google Scholar 

  33. M. Isard, M. Budiu, Y. Yu, A. Birrell, D. Fetterly, Dryad: distributed data-parallel programs from sequential building blocks, in ACM SIGOPS Operating Systems Review, vol. 41 (ACM, 2007), pp. 59–72

    Google Scholar 

  34. M. Islam, A.K. Huang, M. Battisha, M. Chiang, S. Srinivasan, C. Peters, A. Neumann, A. Abdelnur, Oozie: towards a scalable workflow management system for hadoop, in SIGMOD Workshops (2012)

    Google Scholar 

  35. W.M. Johnston, J.R. Hanna, R.J. Millar, Advances in dataflow programming languages. ACM Comput. Surv. (CSUR) 36(1), 1–34 (2004)

    Google Scholar 

  36. A. Lakshman, P. Malik, Cassandra: a decentralized structured storage system. ACM SIGOPS Oper. Syst. Rev. 44(2), 35–40 (2010)

    Article  Google Scholar 

  37. G. Malewicz, M.H. Austern, A.J.C. Bik, J.C. Dehnert, I. Horn, N. Leiser, G. Czajkowski, Pregel: a system for large-scale graph processing, in SIGMOD Conference (2010)

    Google Scholar 

  38. X. Meng, J. Bradley, B. Yavuz, E. Sparks, S. Venkataraman, D. Liu, J. Freeman, D.B. Tsai, M. Amde, S. Owen, et al., Mllib: machine learning in apache spark (2015). arXiv preprint, arXiv:1505.06807

  39. MongoDB Inc. Mongodb for giant ideas (2016). https://www.mongodb.org/. Accessed 27 Feb 2016

  40. C. Olston, B. Reed, U. Srivastava, R. Kumar, A. Tomkins, Pig latin: a not-so-foreign language for data processing, in SIGMOD (2008)

    Google Scholar 

  41. Swift OpenStack. Openstack swift - enterprise storage from swiftstack (2016). https://www.swiftstack.com/openstack-swift/. Accessed 27 Feb 2016

  42. S. Sakr, Big Data 2.0 Processing Systems (Springer, Berlin, 2016)

    Google Scholar 

  43. S. Sakr, M.M. Gaber (eds.) Large Scale and Big Data - Processing and Management (Auerbach Publications, 2014)

    Google Scholar 

  44. Sherif Sakr, Anna Liu, Ayman G. Fayoumi, The family of mapreduce and large-scale data processing systems. ACM Comput. Surv. 46(1), 11 (2013)

    Article  Google Scholar 

  45. K. Shvachko, H. Kuang, S. Radia, R. Chansler, The hadoop distributed file system, in IEEE MSST (2010)

    Google Scholar 

  46. S3 Amazon. Amazon simple storage service (amazon s3) (2016). https://aws.amazon.com/s3/. Accessed 27 Feb 2016

  47. A. Thusoo, J.S. Sarma, N. Jain, Z. Shao, P. Chakka, S. Anthony, H. Liu, P. Wyckoff, R. Murthy, Hive: a warehousing solution over a map-reduce framework. Proc. VLDB Endow. 2(2), 1626–1629 (2009)

    Google Scholar 

  48. A. Toshniwal, S. Taneja, A. Shukla, K. Ramasamy, J.M. Patel, S. Kulkarni, J. Jackson, K. Gade, M. Fu, J. Donham, et al., Storm@ twitter, in Proceedings of the 2014 ACM SIGMOD international conference on Management of data (ACM, 2014), pp. 147–156

    Google Scholar 

  49. Typesafe. Akka (2016). http://akka.io/. Accessed 17 Mar 2016

  50. Typesafe. Play framework - build modern & scalable web apps with java and scala (2016). https://www.playframework.com/. Accessed 17 Mar 2016

  51. L.G. Valiant, A bridging model for parallel computation. Commun. ACM 33(8), 103–111 (1990)

    Article  Google Scholar 

  52. Y. Yu, M. Isard, D. Fetterly, M. Budiu, Ú. Erlingsson, P.K. Gunda, J. Currey, Dryadlinq: a system for general-purpose distributed data-parallel computing using a high-level language, in OSDI, vol. 8 (2008), pp. 1–14

    Google Scholar 

  53. M. Zaharia, M. Chowdhury, M.J. Franklin, S. Shenker, I. Stoica, Spark: cluster computing with working sets, in HotCloud (2010)

    Google Scholar 

  54. M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauly, M.J. Franklin, S. Shenker, I. Stoica, Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing, in NSDI (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dongyao Wu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Cite this chapter

Wu, D., Sakr, S., Zhu, L. (2017). Big Data Programming Models. In: Zomaya, A., Sakr, S. (eds) Handbook of Big Data Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-49340-4_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-49340-4_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-49339-8

  • Online ISBN: 978-3-319-49340-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics