Abstract:
Deep Learning architectures are extensively adopted as the core machine learning framework in both industry and academia. With large amounts of data at their disposal, th...Show MoreMetadata
Abstract:
Deep Learning architectures are extensively adopted as the core machine learning framework in both industry and academia. With large amounts of data at their disposal, these architectures can autonomously extract highly descriptive features for any type of input signals. However, the extensive volume of data combined with the demand for high computational resources, are introducing new challenges in terms of computing platforms. The work herein presented explores the performance of Deep Learning in the field of astrophysics, when conducted on a distributed environment. To set up such an environment, we capitalize on TensorFlowOnSpark, which combines both TensorFlow's dataflow graphs and Spark's cluster management. We report on the performance of a CPU cluster, considering both the number of training nodes and data distribution, while quantifying their effects via the metrics of training accuracy and training loss. Our results indicate that distribution has a positive impact on Deep Learning, since it accelerates our network's convergence for a given number of epochs. However, network traffic adds a significant amount of overhead, rendering it suitable for mostly very deep models or in big Data Analytics.
Date of Conference: 02-06 September 2019
Date Added to IEEE Xplore: 18 November 2019
ISBN Information: