Skip to main content

Methodological Approach to Data-Centric Cloudification of Scientific Iterative Workflows

  • Conference paper
  • First Online:
Algorithms and Architectures for Parallel Processing (ICA3PP 2016)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10048))

  • 1765 Accesses

Abstract

The computational complexity and the constantly increasing amount of input data for scientific computing models is threatening their scalability. In addition, this is leading towards more data-intensive scientific computing, thus rising the need to combine techniques and infrastructures from the HPC and big data worlds. This paper presents a methodological approach to cloudify generalist iterative scientific workflows, with a focus on improving data locality and preserving performance. To evaluate this methodology, it was applied to an hydrological simulator, EnKF-HGS. The design was implemented using Apache Spark, and assessed in a local cluster and in Amazon Elastic Compute Cloud (EC2) against the original version to evaluate performance and scalability.

S. Caíno-Lores—This work has been partially funded under the grant TIN2013-41350-P of the Spanish Ministry of Economics and Competitiveness, the COST Action IC1305 “Network for Sustainable Ultrascale Computing Platforms” (NESUS), and the FPU Training Program for Academic and Teaching Staff FPU15/00422 by the Spanish Ministry of Education.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The Apache Spark project is available at http://spark.apache.org/.

  2. 2.

    HDFS and YARN belong to the Apache Hadoop project, accessible at http://hadoop.apache.org/.

  3. 3.

    Lustre is an open-source file system available at http://lustre.org/.

  4. 4.

    More information on GlusterFS accessible at https://www.gluster.org/.

References

  1. Bauser, G., Hendricks Franssen, H.J., Fritz, S., Kaiser, H.P., Kuhlmann, U., Kinzelbach, W.: A comparison study of two different control criteria for the real-time management of urban groundwater works. J. Environ. Manage. 105, 21–29 (2012)

    Article  Google Scholar 

  2. Brunner, P., Simmons, C.T.: Hydrogeosphere: a fully integrated, physically based hydrological model. Ground Water 50(2), 170–176 (2012)

    Article  Google Scholar 

  3. Burgers, G., van Leeuwen, P.J., Evensen, G.: Analysis scheme in the ensemble Kalman filter. Mon. Weather Rev. 126(6), 1719–1724 (1998)

    Article  Google Scholar 

  4. Caíno-Lores, S., Fernández, A.G., García-Carballeira, F., Pérez, J.C.: A cloudification methodology for multidimensional analysis: implementation and application to a railway power simulator. Simul. Model. Pract. Theory 55, 46–62 (2015)

    Article  Google Scholar 

  5. Caíno-Lores, S., García, A., García-Carballeira, F., Carretero, J.: A cloudification methodology for numerical simulations. In: Lopes, L., Žilinskas, J., Costan, A., Cascella, R.G., Kecskemeti, G., Jeannot, E., Cannataro, M., Ricci, L., Benkner, S., Petit, S., Scarano, V., Gracia, J., Hunold, S., Scott, S.L., Lankes, S., Lengauer, C., Carretero, J., Breitbart, J., Alexander, M. (eds.) Euro-Par 2014. LNCS, vol. 8806, pp. 375–386. Springer, Heidelberg (2014). doi:10.1007/978-3-319-14313-2_32

    Google Scholar 

  6. Chiang, G.T., Dove, M.T., Bovolo, C.I., Ewen, J.: Implementing a grid/cloud escience infrastructure for hydrological sciences. In: Yang, X., Wang, L., Jie, W. (eds.) Guide to e-Science. Computer Communications and Networks, pp. 3–28. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  7. Duro, F.R., Blas, J.G., Isaila, F., Wozniak, J.M., Carretero, J., Ross, R.: Flexible data-aware scheduling for workflows over an in-memory object store. In: 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), pp. 321–324. IEEE (2016)

    Google Scholar 

  8. Jyrkama, M.I.: A methodology for estimating groundwater recharge. Dissertation Abs. Int. Part B: Sci. Eng. 65(5), 2524 (2004)

    Google Scholar 

  9. Kurtz, W., Hendricks Franssen, H.J., Kaiser, H.P., Vereecken, H.: Joint assimilation of piezometric heads and groundwater temperatures for improved modeling of river-aquifer interactions. Water Resour. Res. 50(2), 1665–1688 (2014)

    Article  Google Scholar 

  10. Lapin, A., Schiller, E., Kropf, P., Schilling, O., Brunner, P., Kapic, A.J., Braun, T., Maffioletti, S.: Real-time environmental monitoring for cloud-based hydrogeological modeling with hydrogeosphere. In: 2014 IEEE International Conference on High Performance Computing and Communications, pp. 959–965 (2014)

    Google Scholar 

  11. Lu, S., Li, R.M., Tjhi, W.C., Lee, K.K., Wang, L., Li, X., Ma, D.: A framework for cloud-based large-scale data analytics and visualization: case study on multiscale climate data. In: 2011 IEEE Third International Conference on Cloud Computing Technology and Science (CloudCom), pp. 618–622. IEEE (2011)

    Google Scholar 

  12. McGuire, M.P., Roberge, M.C., Lian, J.: Hydrocloud: a cloud-based system for hydrologic data integration and analysis. In: 2014 Fifth International Conference on Computing for Geospatial Research and Application (COM. Geo), pp. 9–16. IEEE (2014)

    Google Scholar 

  13. Menychtas, A., Konstanteli, K., Alonso, J., Orue-Echevarria, L., Gorronogoitia, J., Kousiouris, G., Santzaridou, C., Bruneliere, H., Pellens, B., Stuer, P., et al.: Software modernization and cloudification using the artist migration methodology and framework. Scalable Comput. Pract. Exp. 15(2), 131–152 (2014)

    Google Scholar 

  14. Nuthula, V., Challa, N.R.: Cloudifying apps - a study of design and architectural considerations for developing cloudenabled applications with case study. In: 2014 IEEE International Conference on Cloud Computing in Emerging Markets (CCEM), pp. 1–7 (2014)

    Google Scholar 

  15. Raicu, I., Foster, I., Zhao, Y.: Many-task computing for grids and supercomputers. In: Workshop on Many-Task Computing on Grids and Supercomputers, MTAGS 2008, pp. 1–11, November 2008

    Google Scholar 

  16. Reed, D.A., Dongarra, J.: Exascale computing and big data. Commun. ACM 58(7), 56–68 (2015)

    Article  Google Scholar 

  17. Srirama, S.N., Viil, J.: Migrating scientific workflows to the cloud: through graph-partitioning, scheduling and peer-to-peer data sharing. In: 2014 IEEE International Conference on High Performance Computing and Communications, pp. 1105–1112. IEEE (2014)

    Google Scholar 

  18. Therrien, R., McLaren, R., Sudicky, E., Panday, S.: A three-dimensional numerical model describing fully-integrated Subsurface and surface flow and solute transport. Technical report (2010)

    Google Scholar 

  19. Yang, C., Goodchild, M., Huang, Q., Nebert, D., Raskin, R., Xu, Y., Bambacus, M., Fay, D.: Spatial cloud computing: how can the geospatial sciences use and help shape cloud computing? Int. J. Digital Earth 4(4), 305–329 (2011)

    Article  Google Scholar 

  20. Yelick, K., Coghlan, S., Draney, B., Canon, R.S., et al.: The Magellan report on cloud computing for science. Technical report, US Department of Energy, Washington DC, USA (2011)

    Google Scholar 

  21. Yu, D., Wang, J., Hu, B., Liu, J., Zhang, X., He, K., Zhang, L.J.: A practical architecture of cloudification of legacy applications. In: 2011 IEEE world congress on Services, pp. 17–24. IEEE (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Silvina Caíno-Lores .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Caíno-Lores, S., Lapin, A., Kropf, P., Carretero, J. (2016). Methodological Approach to Data-Centric Cloudification of Scientific Iterative Workflows. In: Carretero, J., Garcia-Blas, J., Ko, R., Mueller, P., Nakano, K. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2016. Lecture Notes in Computer Science(), vol 10048. Springer, Cham. https://doi.org/10.1007/978-3-319-49583-5_36

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-49583-5_36

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-49582-8

  • Online ISBN: 978-3-319-49583-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics