Abstract:
In the history of Computer Science, the act of `delegation' has been the greatest multiplier of society's problem-solving ability. A scientist working on detecting anomal...Show MoreMetadata
Abstract:
In the history of Computer Science, the act of `delegation' has been the greatest multiplier of society's problem-solving ability. A scientist working on detecting anomalies in a phenomenon, does not need to re-invent matrix multiplication techniques to solve her problem. Scientific workflows provide ultimate `delegation' mechanism - where a domain scientist can completely forget the specifics of `how' her program will execute on a large cluster in an efficient and cost-effective manner and can instead focus on the mathematical formulation and theoretical robustness of her solution. We present here an approach that directly aims to make the execution of Scientific Workflows more reliable, robust and efficient. We aim that the work presented in this paper will propel the larger effort, from the scientific workflow community, of making scientific workflow execution as simple, efficient and robust as a JOIN operation in a modern database. Specifically, we apply Deep Learning techniques to develop a mechanism that forecasts the final state (success or failure) of a dynamic job in a large-scale particle physics experiment, with minimal data gathering, and as early as possible in job's life cycle. The key advantage of having a predictive mechanism to identify and anticipate failure-prone jobs is the potential for designing intelligent Fault Tolerance mechanisms to handle anomalous events. We achieve a 14% improvement in computational resources utilization, and an overall classification accuracy of 85% on real tasks executed in a High Energy Physics Computing workflow. To the best of our knowledge, this is the most exhaustive and first of its kind study of neural network architectures in context of a real-dataset profiled from a large-scale scientific workflow.
Date of Conference: 10-13 December 2018
Date Added to IEEE Xplore: 24 January 2019
ISBN Information: