Abstract
This paper proposes an efficient asynchronous stochastic second order learning algorithm for distributed learning of neural networks (NNs). The proposed algorithm, named distributed bounded stochastic diagonal Levenberg-Marquardt (distributed B-SDLM), is based on the B-SDLM algorithm that converges fast and requires only minimal computational overhead than the stochastic gradient descent (SGD) method. The proposed algorithm is implemented based on the parameter server thread model in the MPICH implementation. Experiments on the MNIST dataset have shown that training using the distributed B-SDLM on a 16-core CPU cluster allows the convolutional neural network (CNN) model to reach the convergence state very fast, with speedups of \(6.03{\times }\) and \(12.28{\times }\) to reach 0.01 training and 0.08 testing loss values, respectively. This also results in significantly less time taken to reach a certain classification accuracy (\(5.67{\times }\) and \(8.72{\times }\) faster to reach \(99\,\%\) training and \(98\,\%\) testing accuracies on the MNIST dataset, respectively).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Agarwal, A., Chapelle, O., Dudík, M., Langford, J.: A reliable effective terascale linear learning system. J. Mach. Learn. Res. 15(1), 1111–1133 (2014)
Chen, Y., Yang, X., Zhong, B., Pan, S., Chen, D., Zhang, H.: CNNTracker: online discriminative object tracking via deep convolutional neural network. Appl. Soft Comput. 38, 1088–1098 (2016)
Dean, J., Corrado, G., Monga, R., Chen, K., Devin, M., Mao, M., Ranzato, M., Senior, A., Tucker, P., Yang, K., Le, Q.V., Ng, A.Y.: Largescale distributed deep networks, pp. 1223–1231. Curran Associates, Inc. (2012)
Gebali, F.: Algorithms and Parallel Computing, vol. 84. Wiley, New York (2011)
Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
LeCun, Y.A., Bottou, L., Orr, G.B., Müller, K.-R.: Efficient BackProp. In: Montavon, G., Orr, G.B., Müller, K.-R. (eds.) Neural Networks: Tricks of the Trade, 2nd edn. LNCS, vol. 7700, pp. 9–48. Springer, Heidelberg (2012)
Liew, S.S., Khalil-Hani, M., Bakhteri, R.: An optimized second order stochastic learning algorithm for neural network training. Neurocomputing 186, 74–89 (2016)
Zeiler, M.D.: ADADELTA: an adaptive learning rate method (2012). arXiv:1212.5701
Acknowledgements
This work is supported by Universiti Teknologi Malaysia (UTM) and the Ministry of Science, Technology and Innovation of Malaysia (MOSTI) under the Science Fund Grant No. 4S116.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Liew, S.S., Khalil-Hani, M., Bakhteri, R. (2016). Distributed B-SDLM: Accelerating the Training Convergence of Deep Neural Networks Through Parallelism. In: Booth, R., Zhang, ML. (eds) PRICAI 2016: Trends in Artificial Intelligence. PRICAI 2016. Lecture Notes in Computer Science(), vol 9810. Springer, Cham. https://doi.org/10.1007/978-3-319-42911-3_20
Download citation
DOI: https://doi.org/10.1007/978-3-319-42911-3_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-42910-6
Online ISBN: 978-3-319-42911-3
eBook Packages: Computer ScienceComputer Science (R0)