Abstract:
Presently, cloud computing technologies have enabled to maintain the distribution of massive data applications, such as scientific workflows. They have helped greatly in ...View moreMetadata
Abstract:
Presently, cloud computing technologies have enabled to maintain the distribution of massive data applications, such as scientific workflows. They have helped greatly in ensuring the processing of immensely huge scientific data stored among distributed data centers. Actually, the processing of massive data via scientific workflows appears to be costly in terms of data transmission, execution delay and bandwidth cost. Consequently, for the execution workflow and data transmission costs to be noticeably reduced, certain data placement optimization techniques turn out to be necessary. Hence, whenever a workflow task appears to require the location of some datasets in different specified data centers, the placement of massive data volumes turns out to constitute a hard challenge. In the present work, a data placement strategy associated with scientific cloud workflow is advanced, as based on fuzzy c-means clustering technique. Actually, the proposed data placement methodology involves a two-stage strategy. The first stage, an offline one, involves grouping the initial datasets into k data centers, and then, regrouping them via fuzzy c-means technique. In the second stage, the online one, and following execution of the workflow, the generated datasets are placed in the data centers according to their dependencies, based on the application of the same fuzzy c-means technique, too. Eventually, the proposed two-stage strategy appears to be effective in reducing the overall data placement amounts in respect of the state-of-the art strategies.
Date of Conference: 08-13 July 2018
Date Added to IEEE Xplore: 14 October 2018
ISBN Information: