Abstract:
Processing Big Data in cloud is on the increase. An important issue for efficient execution of Big Data processing jobs on a cloud platform is selecting the best fitting ...Show MoreMetadata
Abstract:
Processing Big Data in cloud is on the increase. An important issue for efficient execution of Big Data processing jobs on a cloud platform is selecting the best fitting virtual machine (VM) configuration(s) among the miscellany of choices that cloud providers offer. Wise selection of VM configurations can lead to better performance, cost and energy consumption. Therefore, it is crucial to explore the available configurations and opt for the best ones that well suit each MapReduce application. Profiling the given application on all the configurations is costly, time and energy consuming. An alternative is to run the application on a subset of configurations (sample configurations) and estimate its performance on other configurations based on the obtained values by sample configurations. We show that the choice of these sample configurations highly affects accuracy of later estimations. Our Smart Configuration Selection (SCS) scheme chooses better representatives from among all configurations by once-off analysis of given performance figures of the benchmarks so as to increase the accuracy of estimations of missing values, and consequently, to more accurately choose the configuration providing the highest performance. The results show that the SCS choice of sample configurations is very close to the best choice, and can reduce estimation error to 11.58 percent from the original 19.72 percent of random configuration selection. More importantly, using SCS estimations in a makespan minimization algorithm improves the execution time by up to 36.03 percent compared with random sample selection.
Published in: IEEE Transactions on Cloud Computing ( Volume: 7, Issue: 3, 01 July-Sept. 2019)