Abstract
The computational complexity and the constantly increasing amount of input data for scientific computing models is threatening their scalability. In addition, this is leading towards more data-intensive scientific computing, thus rising the need to combine techniques and infrastructures from the HPC and big data worlds. This paper presents a methodological approach to cloudify generalist iterative scientific workflows, with a focus on improving data locality and preserving performance. To evaluate this methodology, it was applied to an hydrological simulator, EnKF-HGS. The design was implemented using Apache Spark, and assessed in a local cluster and in Amazon Elastic Compute Cloud (EC2) against the original version to evaluate performance and scalability.
Keywords
S. Caíno-Lores—This work has been partially funded under the grant TIN2013-41350-P of the Spanish Ministry of Economics and Competitiveness, the COST Action IC1305 “Network for Sustainable Ultrascale Computing Platforms” (NESUS), and the FPU Training Program for Academic and Teaching Staff FPU15/00422 by the Spanish Ministry of Education.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
The Apache Spark project is available at http://spark.apache.org/.
- 2.
HDFS and YARN belong to the Apache Hadoop project, accessible at http://hadoop.apache.org/.
- 3.
Lustre is an open-source file system available at http://lustre.org/.
- 4.
More information on GlusterFS accessible at https://www.gluster.org/.
References
Bauser, G., Hendricks Franssen, H.J., Fritz, S., Kaiser, H.P., Kuhlmann, U., Kinzelbach, W.: A comparison study of two different control criteria for the real-time management of urban groundwater works. J. Environ. Manage. 105, 21–29 (2012)
Brunner, P., Simmons, C.T.: Hydrogeosphere: a fully integrated, physically based hydrological model. Ground Water 50(2), 170–176 (2012)
Burgers, G., van Leeuwen, P.J., Evensen, G.: Analysis scheme in the ensemble Kalman filter. Mon. Weather Rev. 126(6), 1719–1724 (1998)
Caíno-Lores, S., Fernández, A.G., García-Carballeira, F., Pérez, J.C.: A cloudification methodology for multidimensional analysis: implementation and application to a railway power simulator. Simul. Model. Pract. Theory 55, 46–62 (2015)
Caíno-Lores, S., García, A., García-Carballeira, F., Carretero, J.: A cloudification methodology for numerical simulations. In: Lopes, L., Žilinskas, J., Costan, A., Cascella, R.G., Kecskemeti, G., Jeannot, E., Cannataro, M., Ricci, L., Benkner, S., Petit, S., Scarano, V., Gracia, J., Hunold, S., Scott, S.L., Lankes, S., Lengauer, C., Carretero, J., Breitbart, J., Alexander, M. (eds.) Euro-Par 2014. LNCS, vol. 8806, pp. 375–386. Springer, Heidelberg (2014). doi:10.1007/978-3-319-14313-2_32
Chiang, G.T., Dove, M.T., Bovolo, C.I., Ewen, J.: Implementing a grid/cloud escience infrastructure for hydrological sciences. In: Yang, X., Wang, L., Jie, W. (eds.) Guide to e-Science. Computer Communications and Networks, pp. 3–28. Springer, Heidelberg (2011)
Duro, F.R., Blas, J.G., Isaila, F., Wozniak, J.M., Carretero, J., Ross, R.: Flexible data-aware scheduling for workflows over an in-memory object store. In: 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), pp. 321–324. IEEE (2016)
Jyrkama, M.I.: A methodology for estimating groundwater recharge. Dissertation Abs. Int. Part B: Sci. Eng. 65(5), 2524 (2004)
Kurtz, W., Hendricks Franssen, H.J., Kaiser, H.P., Vereecken, H.: Joint assimilation of piezometric heads and groundwater temperatures for improved modeling of river-aquifer interactions. Water Resour. Res. 50(2), 1665–1688 (2014)
Lapin, A., Schiller, E., Kropf, P., Schilling, O., Brunner, P., Kapic, A.J., Braun, T., Maffioletti, S.: Real-time environmental monitoring for cloud-based hydrogeological modeling with hydrogeosphere. In: 2014 IEEE International Conference on High Performance Computing and Communications, pp. 959–965 (2014)
Lu, S., Li, R.M., Tjhi, W.C., Lee, K.K., Wang, L., Li, X., Ma, D.: A framework for cloud-based large-scale data analytics and visualization: case study on multiscale climate data. In: 2011 IEEE Third International Conference on Cloud Computing Technology and Science (CloudCom), pp. 618–622. IEEE (2011)
McGuire, M.P., Roberge, M.C., Lian, J.: Hydrocloud: a cloud-based system for hydrologic data integration and analysis. In: 2014 Fifth International Conference on Computing for Geospatial Research and Application (COM. Geo), pp. 9–16. IEEE (2014)
Menychtas, A., Konstanteli, K., Alonso, J., Orue-Echevarria, L., Gorronogoitia, J., Kousiouris, G., Santzaridou, C., Bruneliere, H., Pellens, B., Stuer, P., et al.: Software modernization and cloudification using the artist migration methodology and framework. Scalable Comput. Pract. Exp. 15(2), 131–152 (2014)
Nuthula, V., Challa, N.R.: Cloudifying apps - a study of design and architectural considerations for developing cloudenabled applications with case study. In: 2014 IEEE International Conference on Cloud Computing in Emerging Markets (CCEM), pp. 1–7 (2014)
Raicu, I., Foster, I., Zhao, Y.: Many-task computing for grids and supercomputers. In: Workshop on Many-Task Computing on Grids and Supercomputers, MTAGS 2008, pp. 1–11, November 2008
Reed, D.A., Dongarra, J.: Exascale computing and big data. Commun. ACM 58(7), 56–68 (2015)
Srirama, S.N., Viil, J.: Migrating scientific workflows to the cloud: through graph-partitioning, scheduling and peer-to-peer data sharing. In: 2014 IEEE International Conference on High Performance Computing and Communications, pp. 1105–1112. IEEE (2014)
Therrien, R., McLaren, R., Sudicky, E., Panday, S.: A three-dimensional numerical model describing fully-integrated Subsurface and surface flow and solute transport. Technical report (2010)
Yang, C., Goodchild, M., Huang, Q., Nebert, D., Raskin, R., Xu, Y., Bambacus, M., Fay, D.: Spatial cloud computing: how can the geospatial sciences use and help shape cloud computing? Int. J. Digital Earth 4(4), 305–329 (2011)
Yelick, K., Coghlan, S., Draney, B., Canon, R.S., et al.: The Magellan report on cloud computing for science. Technical report, US Department of Energy, Washington DC, USA (2011)
Yu, D., Wang, J., Hu, B., Liu, J., Zhang, X., He, K., Zhang, L.J.: A practical architecture of cloudification of legacy applications. In: 2011 IEEE world congress on Services, pp. 17–24. IEEE (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Caíno-Lores, S., Lapin, A., Kropf, P., Carretero, J. (2016). Methodological Approach to Data-Centric Cloudification of Scientific Iterative Workflows. In: Carretero, J., Garcia-Blas, J., Ko, R., Mueller, P., Nakano, K. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2016. Lecture Notes in Computer Science(), vol 10048. Springer, Cham. https://doi.org/10.1007/978-3-319-49583-5_36
Download citation
DOI: https://doi.org/10.1007/978-3-319-49583-5_36
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49582-8
Online ISBN: 978-3-319-49583-5
eBook Packages: Computer ScienceComputer Science (R0)