Skip to main content

Big Data Operations: Basis for Benchmarking a Data Grid

  • Conference paper
  • First Online:
Advancing Big Data Benchmarks (WBDB 2013, WBDB 2013)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8585))

Included in the following conference series:

Abstract

Data Operations over the wide area network are very complex. The end-to-end implementations vary significantly in their efficiency, failure recovery and transactional management. Benchmarking for these operations is vital as we go forward given the exponential growth in data size. The critical evaluation of the types of data operations performed within large-scale data management systems and the comparison of the efficiency of the operations across implementations is an appropriate topic for benchmarking in a big data framework. In this paper, we identify the various operations that are important in large-scale data management and discuss a few of these in terms of data grid benchmarking. These operations form a set of core abstractions that can define interactions with big data systems by domain-centric scientific or business workflow applications. We chose these operational abstractions from our experience in dealing with large-scale distributed systems and with data-intensive computation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. OpenAFS. http://www.openafs.org/

  2. Community Authorization Service. http://toolkit.globus.org/toolkit/docs/4.0/security/cas/

  3. NSF: Cyberinfrastructure Framework for 21st Century Science and Engineering (CIF21). http://www.nsf.gov/about/budget/fy2012/pdf/40_fy2012.pdf

  4. CUAHSI: Consortium of Universities for the Advancement of Hydrologic Science, Inc. http://www.cuahsi.org/his.html

  5. DataONE: Data Observation Network for Earth. http://www.dataone.org/

  6. DFC: The Datanet Federation Consortium. http://datafed.org/

  7. The DropBox. https://www.dropbox.com/

  8. EarthScope: Exploring the Structure and Evolution of the North American Continent. http://www.earthscope.org/

  9. Amazon Elastic Compute Cloud. http://aws.amazon.com/ec2/

  10. The Globus Data Grid Effort. http://www.globus.org/toolkit/docs/2.4/datagrid/

  11. The Gfarm File System. http://datafarm.apgrid.org/

  12. The iPlant Collaborative. http://www.iplantcollaborative.org/

  13. iRODS: Data Grids, Digital Libraries, Persistent Archives, and Real-time Data Systems. https://www.irods.org

  14. Moore, R., Rajasekar, A.: Rule-based distributed data management grid. In: 2007 IEEE/ACM International Conference on Grid Computing (2007)

    Google Scholar 

  15. Moore, R., Rajasekar, A., de Torcy, A.: Policy-based digital library management. In: International Conference on Digital Libraries, Delhi, India, 24–26 February 2009

    Google Scholar 

  16. Rajasekar, A., Wan, M., Moore, M., Schroeder, W.: A prototype rule-based distributed data management system. In: HPDC Workshop on Next Generation Distributed Data Management, Paris, France (2006)

    Google Scholar 

  17. Rajasekar, A., Moore, R., Wan, M., Schroeder, W., Hasan, A.: Applying rules as policies for large-scale data sharing. In: 1st International Conference on Intelligent Systems, Modelling and Simulation, Liverpool, UK, 27–29 January 2010

    Google Scholar 

  18. Wan, M., Moore, R., Rajasekar, A.: Integration of cloud storage with data grids. In: The Third International Conference on the Virtual Computing Initiative, Research Triangle Park, NC, 22–23 October 2009

    Google Scholar 

  19. LSST: The Large Synoptic Survey Telescope. http://www.lsst.org/lsst/science/development

  20. Brown, G.E., Jr.: NEES: Network for Earthquake Engineering Simulation (NEES). http://nees.org

  21. NEON: The National Ecological Observatory Network. http://www.neoninc.org/

  22. OOI: The Ocean Observatory Initiative. http://www.oceanobservatories.org/

  23. RDA: The Research Data Alliance. https://www.rd-alliance.org

  24. SEAD: Sustainable Environment - Actionable Data. http://sead-data.net/

  25. Microsoft SkyDrive. http://www.skydrive.com

  26. TerraPopulus: Integrated Data on Population and Environment. http://www.terrapop.org

  27. Baru, C., Moore, R., Rajasekar, A., Wan, M.: The SDSC storage resource broker. CASCON First Decade High Impact Papers, November 30–December 3 1998 (Reprint), pp. 189–200. doi:10.1145/1925805.1925816

  28. Guru, S.M., Kearney, M., Fitch, P., Peters, C.: Challenges in using scientific workflow tools in the hydrology domain. In: 18th World IMACS/MODSIM Congress, Cairns, Australia, 13–17 July 2009. http://www.mssanz.org.au/modsim09/I8/guru.pdf

  29. VIC: Variable Infiltration Capacity Macroscale Hydrologic Model. http://www.hydro.washington.edu/Lettenmaier/Models/VIC/

  30. RHESSys, Regional Hydro-Ecologic Simulation System. http://fiesta.bren.ucsb.edu/~rhessys/index.html

  31. Schaaff, A., Verdes-Montenegro, L., Ruiz, J.E., Santander Vela, J.: Scientific workflows in astronomy. In: Ballester, P., Egret, D., Lorente, N.P.F. (eds.) Proceedings of a Conference held at Marriott Rive Gauche Conference Center, Paris, France, 6–10 November 2011. ASP Conference Series, vol. 461, p. 875. Astronomical Society of the Pacific, San Francisco (2012)

    Google Scholar 

  32. Ghosh, S., Matsuoka, Y., Asai, Y., Hsin, K., Kitano, H.: Software for systems biology: from tools to integrated platforms. Nat. Rev. Genet. 12, 821–832 (2011). doi:10.1038/nrg3096. http://www.nature.com/nrg/journal/v12/n12/full/nrg3096.html

    Google Scholar 

  33. Jimenez, R.C., Corpas, M.: Bioinformatics workflows and web services in systems biology made easy for experimentalists. Methods Mol Biol. (2013). doi:10.1007/978-1-62703-450-0_16. 1021:299-310. http://www.ncbi.nlm.nih.gov/pubmed/23715992

    Google Scholar 

  34. NARR: NCEP North American Regional Reanalysis. http://www.esrl.noaa.gov/psd/data/gridded/data.narr.html

  35. TPC: Transaction Processing Performance Council. http://www.tpc.org/default.asp

Download references

Acknowledgement

We acknowledge the funding by NSF grant #1247652 “BIGDATA: Mid-Scale: ESCE: DCM: Collaborative Research: DataBridge - A Sociometric System for Long tail Science Data Collections”, by NSF grant #0940841 “DataNet Federation Consortium” and by NSF grant #1032732 “SDCI Data Improvement: Improvement and Sustainability of iRODS Data Grid Software for Multi-Disciplinary Community Driven Application”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Arcot Rajasekar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Rajasekar, A., Moore, R., Huang, S., Xin, Y. (2014). Big Data Operations: Basis for Benchmarking a Data Grid. In: Rabl, T., Raghunath, N., Poess, M., Bhandarkar, M., Jacobsen, HA., Baru, C. (eds) Advancing Big Data Benchmarks. WBDB WBDB 2013 2013. Lecture Notes in Computer Science(), vol 8585. Springer, Cham. https://doi.org/10.1007/978-3-319-10596-3_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-10596-3_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-10595-6

  • Online ISBN: 978-3-319-10596-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics