Skip to main content
Log in

A topical evaluation and discussion of data movement technologies for data-intensive scientific applications

  • Software Article
  • Published:
Earth Science Informatics Aims and scope Submit manuscript

Abstract

Transferring large volumes of information from one location to potentially many others that are geographically distributed and across varying networks is still prevalent in modern scientific data systems. This is despite the movement to push computation to the data and to reduce data movement needed to compute answers to challenging scientific problems, to disseminate information to the scientific community, and to acquire data for curation and enrichment. Because of this, it is imperative that decisions made regarding data movement systems and architectures be backed by both analytical rigor, and also by empirical evidence and measurement. The purpose of this study is to expand on the work performed by our research team over the last decade and to take a fresh look at the evaluation of multiple topical data transfer technologies in use cases derived from data-intensive scientific systems and applications in the areas of Earth science. We report on the evaluation of a set of data movement technologies against a set of empirically derived comparison dimensions. Based on this evaluation, we make recommendations towards the selection of appropriate data movement technologies in scientific applications and scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. Data movement, and data transfer are used interchangeably throughout the paper.

References

  • Evans C (2001) Comments on the overall architecture of Vsftpd, from a security standpoint. Internet

  • Foster I (2011) Globus online: accelerating and democratizing science through cloud-based services. IEEE Internet Comput 15(3):70–73

    Article  Google Scholar 

  • Goland Y et al (1999) HTTP extensions for distributed authoring–WEBDAV

  • Gu Y, Grossman RL (2007) UDT: UDP-based data transfer for high-speed wide area networks. Comput Netw 51(7):1777–1799

    Article  Google Scholar 

  • JPL Snow Server. http://snow.jpl.nasa.gov/. Accessed Nov 2014

  • Kempler S et al (2009) Evolution of information management at the GSFC earth sciences (GES) data and information services center (DISC): 2006–2007. IEEE Trans Geosci Remote Sens 47(1):21–28

    Article  Google Scholar 

  • Kernighan BW, Mashey JR (1979) The UNIX™ programming environment. Software: Practice and experience. Wiley

  • Masuoka E et al (2001) Evolution of the MODIS science data processing system. IEEE Geosci Remote Sensing Symp, 2001. IGARSS'01. IEEE 2001 International. Vol. 3. IEEE

  • Mattmann C (2007) Software connectors for highly voluminous and distributed data-intensive systems. Ph. D. Dissertation, USC

  • Mattmann CA et al (2006) A classification and evaluation of data movement technologies for the delivery of highly voluminous scientific data products. IEEE MSST

  • Mattmann CA, Woollard D, Mahjourian R (2007) Software connector classification and selection for data-intensive systems. Proceedings of the Second International Workshop on Incorporating COTS Software into Software Systems: Tools and Techniques. IEEE Comput Soc

  • Mattmann CA et al (2010) Experiments with storage and preservation of NASA's planetary data via the cloud. IEEE IT Prof 12(5):28–35

    Article  Google Scholar 

  • Mattmann CA, Waliser D, Kim J, Goodale C, Hart A, Ramirez P, Crichton D, Zimdars P, Boustani M, Lee K, Loikith P, Whitehall K, Jack C, Hewitson B (2013) Cloud computing and virtualization within the regional climate model and evaluation system. Earth Sci Inf 7:1–12

    Article  Google Scholar 

  • Mell P, Grance T (2011) The NIST definition of cloud computing

  • Postel J, Reynolds J File transfer protocol. Request for Comments (RFC) 959 October 1985. http://tools.ietf.org/html/rfc959

  • Running, SW et al (2000) Global terrestrial gross and net primary productivity from the earth observing system. Methods in ecosystem science. Springer, New York, p 44–57

  • (2013) Secure copy. http://en.wikipedia.org/wiki/Secure_copy

  • Sotomayor B et al (2009) Virtual infrastructure management in private and hybrid clouds. IEEE Internet Comput 13(5):14–22

    Article  Google Scholar 

  • Tarannum, N, Ahmed N (2014) Efficient and reliable hybrid cloud architechture for big data. arXiv preprint arXiv:1405.5200

  • Tran JJ et al (2011) Evaluating cloud computing in the NASA DESDynI ground data system. Proceedings of the 2nd International Workshop on Software Engineering for Cloud Computing. ACM

  • White T (2009) Hadoop: The definitive guide. O'Reilly Media, Inc.

  • Williams DN et al (2009) The Earth System Grid: enabling access to multimodel climate simulation data. Bull Am Meteorol Soc 90(2):195–205

    Article  Google Scholar 

  • Woollard D et al (2008) Scientific software as workflows: from discovery to distribution. IEEE Softw 25(4):37–43

    Article  Google Scholar 

  • Zaharia M et al (2010) Spark: cluster computing with working sets. Proceedings of the 2nd USENIX conference on Hot topics in cloud computing

Download references

Acknowledgments

Support provided by NASA Earth Sciences Division, NASA NCA (ID: 11-NCA11-0028) and NASA’s Advanced Information Systems Technology (AIST) program (ID: AIST-QRS-12-0002) and through the NASA Computational Modeling and Cyberinfrastructure (CMAC) program (11-CMAC11-0011). In addition, funding is provided by the National Science Foundation ExArch program (ID: 1125798), a component of the G8 initiative. Valuable contributions to the RCMES activity by way of collaboration comes from the World Climate Research Program (WCRP) Coordinated Regional Climate Downscaling Experiment (CORDEX), the North American Regional Climate Change Assessment Program (NARCCAP), the Climate & Development Knowledge Network (CDKN) and the University of Cape Town, and PCMDI through support of the obs4MIPs activity.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chris A. Mattmann.

Additional information

Communicated by: H. A. Babaie

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mattmann, C.A., Cinquini, L., Zimdars, P. et al. A topical evaluation and discussion of data movement technologies for data-intensive scientific applications. Earth Sci Inform 9, 247–262 (2016). https://doi.org/10.1007/s12145-015-0243-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12145-015-0243-1

Keywords

Navigation