Abstract:
Spark has become a very attractive platform for big data analytics in recent years due to its unique advantages such as parallelism, fault tolerance, and complexity assoc...Show MoreMetadata
Abstract:
Spark has become a very attractive platform for big data analytics in recent years due to its unique advantages such as parallelism, fault tolerance, and complexity associated with clusters setup. On the spark platform, users can adjust parameter configurations according to different job requirements and specific applications to optimize performance. This leads to a problem that we can't ignore, Spark already has more than 180 parameters, and its huge combination of parameters means that we can't rely on manual tuning to grasp the impact of all parameters on performance. In order to solve the problem of relying heavily on expert experience and manual operation, we propose Otterman, a parameters optimization approach based on the combination of Simulated Annealing algorithm and Least Squares method, which can help us dynamically adjust parameters according to job types to obtain optimal configuration to improve performance. Simulated Annealing can find the optimal solution, but has poor convergence. We make use of the Least Squares method to effectively improve the speed at which the former converges to the optimal solution. Otterman is simple and easy to perform, with no additional cost. The effectiveness of the approach is verified by experiments, the results show that Otterman's average performance has increased by 30% compared to the default parameters configuration, with an accuracy of about 68%.
Date of Conference: 10-12 November 2018
Date Added to IEEE Xplore: 03 January 2019
ISBN Information: