Skip to main content
Log in

Using distributed ledger technology to democratize neural network training

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Artificial Intelligence has regained research interest, primarily because of big data. Internet expansion, social networks and online sensors led to the generation of an enormous amount of information daily. This unprecedented data availability boosted Machine Learning. A research area that has greatly benefited from this fact is Deep Neural Networks. Nowadays many use cases require huge models with millions of parameters and big data are proven to be essential to their proper training. The scientific community has proposed several methods to generate more accurate models. Usually, these methods need high performance infrastructure, which limits their applicability to large organizations and institutions that have the required funds. Another source of concern is privacy; anyone using the leased processing power of a remote data center, must trust another entity with their data. Unfortunately, in many cases sensitive data were leaked, either for financial exploitation or due to security issues. However, there is a lack of research studies when it comes to open communities of individuals with commodity hardware, who wish to join forces in a way that is non-binding and without the need for a central authority. Our work on LEARNAE attempts to fill this gap, by creating a way of providing training in Artificial Neural Networks, featuring decentralization, data ownership and fault tolerance. This article adds some important pieces to the puzzle: It studies the resilience of LEARNAE when dealing with network disruptions and proposes a novel way of embedding low-energy sensors that reside in the Internet of Things domain, retaining at the same time the established distributed philosophy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Availability of data and material

For the experiments the following publicly available dataset was used: HEPMASSFootnote 7, dataset for training systems on exotic particle detection.

Code Availability

Code and deployment instructions are available upon request by the authors.

Notes

  1. https://en.wikipedia.org/wiki/Gossip_protocol

  2. Apache Spark website, https://spark.apache.org

  3. Bitswap webpage, https://github.com/ipfs/specs/tree/master/bitswap

  4. Statistics webpage, https://www.statista.com/statistics/802690/worldwide-connected-devices-by-access-technology

  5. Project whitepaper, https://iota.org/IOTA_Whitepaper.pdf

  6. Device webpage, https://www.raspberrypi.org/products/raspberry-pi-3-model-b

  7. http://archive.ics.uci.edu/ml/datasets/hepmass

References

  1. Nikolaidis S, Refanidis I (2019) Learnae: Distributed and resilient deep neural network training for heterogeneous peer to peer topologies. International conference on engineering applications of neural networks pp. 286–298. https://doi.org/10.1007/978-3-030-20257-6_24

  2. Nikolaidis S, Refanidis I (2020) Privacy preserving distributed training of neural networks. Neural Comput & Applic. https://doi.org/10.1007/s00521-020-04880-0

  3. Benet J (2014) IPFS - Content addressed, versioned. P2P File System. arXiv:1407.3561

  4. Popov S, Saa O, Finardi P (2018) Equilibria in the Tangle. arXiv:1712.05385

  5. Zhang X, Trmal J, Povey D, Khudanpur S (2014) Improving deep neural network acoustic models using generalized maxout networks in Acoustics, Speech and Signal Processing (ICASSP). IEEE International Conference

  6. Miao Y, Zhang H, Metze F (2014) Distributed learning of multilingual dnn feature extractors using gpus

  7. Povey D, Zhang X, Khudanpur S (2014) Parallel training of deep neural networks with natural gradient and parameter averaging

  8. Dean J, Corrado GS, Monga R, Chen K, Devin M, Le QV, Mao M, Razato M, Senior A, Tucker P, Yang K, Ng AY (2012) Large scale distributed deep networks. Advances in neural information processing systems, pp 1223–1231

  9. Dekel O, Gilad-Bachrach R, Shamir O, Xiao L (2012) Optimal distributed online prediction using mini-batches. J Mach Learn Res, pp 165–202

  10. Li M, Andersen DG, Park JW, Smola AJ, Ahmed A, Josifovski V, Long J, Shekita EJ, Su BY (2014) Scaling distributed machine learning with the parameter server. 11th USENIX symposium on operating systems design and implementation pp 583–598

  11. Iandola FN, Ashraf K, Moskewicz MW, Keutzer K (2015) FireCaffe: near-linear acceleration of deep neural network training on compute clusters. arXiv:1511.00175

  12. Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick RB, Guadarrama S, Darrell T (2014) Caffe: Convolutional architecture for fast feature embedding. ACM intl Conference on Multimedia, pp 675–678

  13. Feng A, Shi J, Jain M (2016) CaffeOnSpark open sourced for distributed deep learning on big data clusters

  14. Wang Y, Zhang X, Wong I, Dai J, Zhang Y et al (2017) BigDL Programming Guide

  15. Langer M, Hall A, He Z, Rahayu W (2018) MPCA SGD - A method for distributed training of deep learning models on spark. IEEE Transactions on Parallel and Distributed Systems 29:2540–2556. https://doi.org/10.1109/TPDS.2018.2833074

    Article  Google Scholar 

  16. Niu F, Recht B, Re C, Wright SJ (2011) HOGWILD!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent. arXiv:1106.5730v2

  17. Dean J, Corrado GS, Monga R, Chen K, Devin M, Le QV, Mao M, Razato M, Senior A, Tucker P, Yang K, Ng AY (2012) Large scale distributed deep networks. Advances in neural information processing systems, pp 1223–1231

  18. Chilimbi T, Suzue Y, Apacible Y, Kalyanaraman K (2014) Project Adam: Building an efficient and scalable deep learning training system. 11th USENIX symposium on operating systems design and implementation 571-582

  19. Zhang S, Choromanska A, LeCun Y (2015) Deep learning with elastic averaging SGD. Advances in neural information processing systems, pp 685–693

  20. Chen T, Li M, Li Y, Lin M, Wang N, Wang M, Xiao T, Xu B, Zhang C (2015) MXNet: A flexible and efficient machine learning library for heterogeneous distributed systems. LearningSys

  21. Xing EP, Ho Q, Dai W, Kim JK, Wei J, Lee S, Zheng X, Xie P, Kumar A, Yu Y (2015) Petuum: a new platform for distributed machine learning on big data. IEEE Transactions on Big Data 1:49–67

    Article  Google Scholar 

  22. Shokri R, Shmatikov V (2015) Privacy-Preserving Deep Learning. 22nd ACM SIGSAC 1310-1321

  23. Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M, Kudlur M, Levenberg J, Monga R, Moore S, Murray DG, Steiner B, Tucker P, Vasudevan V, Warden P, Wicke M, Yu Y, Zheng X (2016) TensorFlow: A system for large-scale machine learning. 12th USENIX symposium on operating systems design and implementation 265-283

  24. Moritz P, Nishihara R, Stoica I, Jordan MI (2016) SparkNet: Training deep networks in spark. Intl. conference on learning representations

  25. Lian X, Zhang C, Zhang H, Hsieh CJ, Zhang w, Liu J (2017) Can decentralized algorithms outperform centralized algorithms? A case study for decentralized parallel stochastic gradient descent. Advances in neural information processing systems (NIPS)

  26. Blot M, Picard D, Cord M, Thome N (2016) Gossip training for deep learning. arXiv:1611.09726

  27. Boyd S, Ghosh A, Prabhakar B, Shah D (2006) Randomized gossip algorithms. IEEE Trans Inf Theory 52:2508–2530

    Article  MathSciNet  Google Scholar 

  28. Kim H, Park J, Jang J, Yoon S (2016) DeepSpark: Spark-Based Deep Learning Supporting Asynchronous Updates and Caffe Compatibility. arXiv:1602.08191

  29. Lian X, Zhang W, Zhang C, Liu J (2018) Asynchronous decentralized parallel stochastic gradient descent International Conference on Machine Learning (ICML)

  30. Coninck E, Bohez S, Leroux S, Verbelen T, Vankeirsbilck B, Simoens P, Dhoedt B (2018) DIANNE: A modular framework for designing, training and deploying deep neural networks on heterogeneous distributed infrastructure. J Syst Softw 141:52–65

    Article  Google Scholar 

  31. Mamidala AR, Kollias G, Ward C, Artico F (2018) MXNET-MPI: Embedding MPI parallelism in parameter server task model for scaling deep learning. arXiv:1801.03855

  32. Sergeev A, Del Balso M (2018) Horovod: fast and easy distributed deep learning in TensorFlow. arXiv:1802.05799

  33. Peng Y, Zhu Y, Chen Y, Bao Y, Yi B, Lan C, Wu C, Guo C (2019) A generic communication scheduler for distributed DNN training acceleration. Proceedings of the 27th ACM symposium on operating systems principles, pp 16–29

  34. Jiang Y, Zhu Y, Lan C, Yi B, Cui Y, Guo C (2020) A unified architecture for accelerating distributed DNN training in heterogeneous GPU/CPU clusters. 14th USENIX symposium on operating systems design and implementation

  35. Zheng S, Huang Z, Kwok J (2019) Communication-efficient distributed blockwise momentum SGD with error-feedback. Advances in neural information processing systems

  36. Yuan B, Wolfe CR, Dun C, Tang Y, Kyrillidis A, Jermaine CM (2020) Distributed learning of deep neural networks using independent subnet training. arXiv:1910.02120

  37. Shen S, Xu L, Liu J, Liang X, Cheng Y (2019) Faster Distributed Deep Net Training: Computation and Communication Decoupled Stochastic Gradient Descent. arXiv:1906.12043

  38. Wang S, Li D, Geng J (2020) Geryon: Accelerating distributed CNN training by Network-Level flow scheduling. IEEE conference on computer communications, pp 1678–1687

  39. Bao Y, Peng Y, Chen Y, Wu C (2020) Preemptive all-reduce scheduling for expediting distributed DNN training. IEEE Conference on computer communications, pp 626–635

  40. Jayarajan A, Wei J, Gibson G, Fedorova A, Pekhimenko G (2019) Priority-based Parameter Propagation for Distributed DNN Training. arXiv:1905.03960

  41. Sapio A, Canini M, Ho C, Nelson J, Kalnis P, Kim C, Krishnamurthy A, Moshref M, Ports DRK, Richtarik P (2019) Scaling Distributed Machine Learning with In-Network Aggregation. arXiv:1903.06701

  42. Hashemi SH, Jyothi SA, Campbell RH (2018) Communication Scheduling as a First-Class Citizen in Distributed Machine Learning Systems. arXiv:1803.03288

  43. Hsu A, Hu K, Hung J, Suresh A, Zhang Z (2019) TonY: An orchestrator for distributed machine learning jobs. USENIX conference on operational machine learning

  44. Shi S, Zhou X, Song S, Wang X, Zhu Z, Huang X, Jiang X, Zhou F, Guo Z, Xie L, Lan R, Ouyang X, Zhang Y, Wei J, Gong J, Lin W, Gao P, Meng P, Xu X, Guo C, Yang B, Chen Z, Wu Y, Chu X (2020) Towards scalable distributed training of deep learning on public cloud clusters. arXiv:2010.10458

  45. Mashtizadeh AJ, Bittau A, Huang YF, Mazieres D (2013) Replication, history, and grafting in the ori file system. Twenty-fourth ACM symposium on operating systems principles, pp 151–166

  46. Cohen B (2003) Incentives build robustness in bittorrent. Workshop on Economics of Peer-to-Peer systems 6:68–72

    Google Scholar 

  47. Baumgart I, Mies S (2007) S/kademlia: A practicable approach towards secure key based routing. Parallel and Distributed Systems International Conference

  48. Freedman MJ, Freudenthal E, Mazieres D (2004) Democratizing content publication with coral. NSDI 4:18–18

    Google Scholar 

  49. Wang L, Kangasharju J (2013) Measuring large-scale distributed systems: case of bittorrent mainline dht. IEEE Thirteenth International Conference, pp 1–10

  50. Merkle RC (1988) A digital signature based on a conventional encryption function. Advances in cryptology - CRYPTO ’87

Download references

Funding

This research is not funded by any source.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Spyridon Nikolaidis.

Ethics declarations

Conflict of Interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nikolaidis, S., Refanidis, I. Using distributed ledger technology to democratize neural network training. Appl Intell 51, 8288–8304 (2021). https://doi.org/10.1007/s10489-021-02340-3

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-02340-3

Keywords

Navigation