ABSTRACT
We discuss the impact of clouds and grid technology on scientific computing using examples from a variety of fields -- especially the life sciences. We cover the impact of the growing importance of data analysis and note that it is more suitable for these modern architectures than the large simulations (particle dynamics and partial differential equation solution) that are mainstream use of large scale "massively parallel" supercomputers. The importance of grids is seen in the support of distributed data collection and archiving while clouds are and will replace grids for the large scale analysis of the data.
We discuss the structure of algorithms (and the associated applications) that will run on current clouds and use either the basic "on-demand" computing paradigm or higher level frameworks based on MapReduce and its extensions. Looking at performance of MPI (mainstay of scientific computing) and MapReduce both theoretically and experimentally shows that current MapReduce implementations run well on algorithms that are a "Map" followed by a "Reduce" but perform poorly on algorithms that iterate over many such phases. Several important algorithms including parallel linear algebra falls into latter class. One can define MapReduce extensions to accommodate iterative map and reduce but these have less fault tolerance than basic MapReduce. We discuss clustering, dimension reduction and sequence assembly and annotation as example algorithms.
Index Terms
- Algorithms and application for grids and clouds
Recommendations
Large scale data analytics on clouds
CloudDB '12: Proceedings of the fourth international workshop on Cloud data managementWe summarize important overall issues affecting use of clouds to support Data Science. We describe the mapping of different applications to HPCC and Cloud systems and the architecture that support data analytics that is interoperable between these ...
MapReduce in MPI for Large-scale graph algorithms
We describe a parallel library written with message-passing (MPI) calls that allows algorithms to be expressed in the MapReduce paradigm. This means the calling program does not need to include explicit parallel code, but instead provides ''map'' and ''...
Application Level Interoperability between Clouds and Grids
GPC '09: Proceedings of the 2009 Workshops at the Grid and Pervasive Computing ConferenceSAGA is a high-level programming interface which provides the ability to develop distributed applications in aninfrastructure independent way. In an earlier paper, we discussed how SAGA was used to develop a version of MapReduce which provided the user ...
Comments