ABSTRACT
Modern Neural Network (NN) models require more data and parameters to perform ever more complicated tasks. One approach to train a massive NN is to distribute it across multiple devices. This approach raises a problem known as the device placement problem. Most of the state-of-the-art solutions that tackle this problem leverage graph embedding techniques. In this work, we assess the impact of different graph embedding techniques on the quality of device placement, measured by (i) the execution time of partitioned NN models, and (ii) the computation time of the graph embedding technique. In particular, we expand Placeto, a state-of-the-art device placement solution, and evaluate the impact of two graph embedding techniques, GraphSAGE and P-GNN, compared to the original Placeto graph embedding model, Placeto-GNN. In terms of the execution time improvement, we achieve an increase of 23.967% when using P-GNN compared to Placeto-GNN, while GraphSAGE produces 1.165% better results than Placeto-GNN. Regarding computation time, GraphSAGE has a gain of 11.569% compared to Placeto-GNN, whereas P-GNN is 6.95% slower than it.
- R. Addanki et al. 2019. Placeto: Learning generalizable device placement algorithms for distributed machine learning. arXiv:1906.08879 (2019).Google Scholar
- J. Dean et al. 2012. Large scale distributed deep networks. In Advances in neural information processing systems. 1223--1231. Google ScholarDigital Library
- W. Hamilton et al. 2017. Inductive representation learning on large graphs. (2017), 1024--1034. Google ScholarDigital Library
- P. Hieu et al. 2018. Efficient neural architecture search via parameter sharing. arXiv:1802.03268 (2018).Google Scholar
- S. Hochreiter et al. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780. Google ScholarDigital Library
- Y. Huang et al. 2018. Flexps: Flexible parallelism control in parameter server architecture. VLDB Endowment 11, 5 (2018), 566--579. Google ScholarDigital Library
- R. Mayer et al. 2017. The tensorflow partitioning and scheduling problem: it's the critical path!. In DIDL. 1--6. Google ScholarDigital Library
- A. Mirhoseini et al. 2017. Device placement optimization with reinforcement learning. arXiv:1706.04972 (2017). Google ScholarDigital Library
- A. Mirhoseini et al. 2018. A hierarchical model for device placement. (2018).Google Scholar
- A. Nazi et al. 2019. Gap: Generalizable approximate graph partitioning framework. arXiv:1903.00614 (2019).Google Scholar
- J Schulman et al. 2017. Proximal policy optimization algorithms. arXiv:1707.06347 (2017).Google Scholar
- A. Sergeev et al. 2018. Horovod: fast and easy distributed deep learning in TensorFlow. arXiv:1802.05799 (2018).Google Scholar
- A Vaswani et al. 2017. Attention is all you need. In Advances in neural information processing systems. 5998--6008. Google ScholarDigital Library
- Y. Wu et al. 2016. Google's neural machine translation system: Bridging the gap between human and machine translation. arXiv:1609.08144 (2016).Google Scholar
- J You et al. 2019. Position-aware graph neural networks. arXiv:1906.04817 (2019).Google Scholar
- G. Yuanxiang et al. 2018. Spotlight: Optimizing device placement for training deep neural networks. In ICML. 1676--1684.Google Scholar
- Y. Zhou et al. 2019. GDP: Generalized Device Placement for Dataflow Graphs. arXiv:1910.01578 (2019).Google Scholar
Index Terms
- Graph Representation Matters in Device Placement
Recommendations
Accelerate Model Parallel Deep Learning Training Using Effective Graph Traversal Order in Device Placement
Distributed Applications and Interoperable SystemsAbstractModern neural networks require long training to reach decent performance on massive datasets. One common approach to speed up training is model parallelization, where large neural networks are split across multiple devices. However, different ...
A plane graph representation of triconnected graphs
Given a graph G=(V,E), a set S={s1,s2, ,sk} of k vertices of V, and k natural numbers n1,n2, ,nk such that i=1kni=|V|, the k-partition problem is to find a partition V1,V2, ,Vk of the vertex set V such that |Vi|=ni, si Vi, and Vi induces a connected ...
An Embedding Graph-based Model for Software Watermarking
IIH-MSP '12: Proceedings of the 2012 Eighth International Conference on Intelligent Information Hiding and Multimedia Signal ProcessingIn a software watermarking environment, several graph theoretic watermark methods encode the watermark values as graph structures and embed them in application programs. In this paper we first present an efficient codec system for encoding a watermark ...
Comments