skip to main content
10.1145/3432261.3432262acmotherconferencesArticle/Chapter ViewAbstractPublication PageshpcasiaConference Proceedingsconference-collections
research-article

A Deep Reinforcement Learning Method for Solving Task Mapping Problems with Dynamic Traffic on Parallel Systems

Published:20 January 2021Publication History

ABSTRACT

Efficient mapping of application communication patterns to the network topology is a critical problem for optimizing the performance of communication bound applications on parallel computing systems. The problem has been extensively studied in the past, but they mostly formulate the problem as finding an isomorphic mapping between two static graphs with edges annotated by traffic volume and network bandwidth. But in practice, the network performance is difficult to be accurately estimated, and communication patterns are often changing over time and not easily obtained. Therefore, this work proposes a deep reinforcement learning (DRL) approach to explore better task mappings by utilizing the performance prediction and runtime communication behaviors provided from a simulator to learn an efficient task mapping algorithm. We extensively evaluated our approach using both synthetic and real applications with varied communication patterns on Torus and Dragonfly networks. Compared with several existing approaches from literature and software library, our proposed approach found task mappings that consistently achieved comparable or better application performance. Especially for a real application, the average improvement of our approach on Torus and Dragonfly networks are 11% and 16%, respectively. In comparison, the average improvements of other approaches are all less than 6%.

References

  1. Ravichandra Addanki, Shaileshh Bojja Venkatakrishnan, Shreyan Gupta, Hongzi Mao, and Mohammad Alizadeh. 2019. Placeto: Learning Generalizable Device Placement Algorithms for Distributed Machine Learning. arXiv e-prints (June 2019).Google ScholarGoogle Scholar
  2. Kadir Akbudak, Enver Kayaaslan, and Cevdet Aykanat. 2013. Hypergraph Partitioning Based Models and Methods for Exploiting Cache Locality in Sparse Matrix-Vector Multiplication. SIAM Journal on Scientific Computing 35, 3 (2013), C237–C262.Google ScholarGoogle ScholarCross RefCross Ref
  3. Irwan Bello, Hieu Pham, Quoc V. Le, Mohammad Norouzi, and Samy Bengio. 2017. Neural Combinatorial Optimization with Reinforcement Learning.Google ScholarGoogle Scholar
  4. A. Bhatele, N. Jain, K. E. Isaacs, R. Buch, T. Gamblin, S. H. Langer, and L. V. Kale. 2014. Optimizing the performance of parallel applications on a 5D torus via task mapping. In International Conference on High Performance Computing. 1–10.Google ScholarGoogle ScholarCross RefCross Ref
  5. A. Bhatele and L. V. Kale. 2011. Heuristic-Based Techniques for Mapping Irregular Communication Graphs to Mesh Topologies. In IEEE International Conference on High Performance Computing and Communications. 765–771.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Abhinav Bhatelé, Laxmikant V. Kalé, and Sameer Kumar. 2009. Dynamic Topology Aware Load Balancing Algorithms for Molecular Dynamics Applications. In Proceedings of ACM/IEEE Conference on Supercomputing. 110–116.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. Bhatelé, G. R. Gupta, L. V. Kalé, and I. Chung. 2010. Automated mapping of regular communication graphs on mesh interconnects. In International Conference on High Performance Computing. 1–10.Google ScholarGoogle Scholar
  8. Bokhari. 1981. On the Mapping Problem. IEEE Trans. Comput. C-30, 3 (1981), 207–214.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. W. Bollinger and S. F. Midkiff. 1991. Heuristic technique for processor and link assignment in multicomputers. IEEE Trans. Comput. 40, 3 (1991), 325–333.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Rajkumar Buyya and Manzur Murshed. 2002. GridSim: A Toolkit for the Modeling and Simulation of Distributed Resource Management and Scheduling for Grid Computing. Concurrency and Computation: Practice and Experience 14 (11 2002).Google ScholarGoogle Scholar
  11. Rodrigo N. Calheiros, Rajiv Ranjan, Anton Beloglazov, César A. F. De Rose, and Rajkumar Buyya. 2011. CloudSim: A Toolkit for Modeling and Simulation of Cloud Computing Environments and Evaluation of Resource Provisioning Algorithms. Softw. Pract. Exper. 41, 1 (Jan. 2011), 23–50.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Henri Casanova, Arnaud Giersch, Arnaud Legrand, Martin Quinson, and Frédéric Suter. 2014. Versatile, Scalable, and Accurate Simulation of Distributed Applications and Platforms. J. Parallel and Distrib. Comput. 74, 10 (June 2014), 2899–2917.Google ScholarGoogle ScholarCross RefCross Ref
  13. Timothy A. Davis and Yifan Hu. 2011. The University of Florida Sparse Matrix Collection. ACM Trans. Math. Softw. 38, 1, Article 1 (Dec. 2011), 25 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Degomme, A. Legrand, G. S. Markomanolis, M. Quinson, M. Stillwell, and F. Suter. 2017. Simulating MPI Applications: The SMPI Approach. IEEE Transactions on Parallel and Distributed Systems 28, 8 (2017), 2387–2400.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Deveci, K. Kaya, B. Uçar, and Ü. V. Çatalyürek. 2015. Fast and High Quality Topology-Aware Task Mapping. In 2015 IEEE International Parallel and Distributed Processing Symposium. 197–206. https://doi.org/10.1109/IPDPS.2015.93Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. Deveci, S. Rajamanickam, V. J. Leung, K. Pedretti, S. L. Olivier, D. P. Bunde, U. V. Çatalyürek, and K. Devine. 2014. Exploiting Geometric Partitioning in Task Mapping for Parallel Computers. In IEEE IPDPS. 27–36.Google ScholarGoogle Scholar
  17. Prafulla Dhariwal, Christopher Hesse, Oleg Klimov, Alex Nichol, Matthias Plappert, Alec Radford, John Schulman, Szymon Sidor, Yuhuai Wu, and Peter Zhokhov. 2017. OpenAI Baselines. https://github.com/openai/baselines.Google ScholarGoogle Scholar
  18. Lasse Espeholt, Hubert Soyer, Remi Munos, Karen Simonyan, Volodymir Mnih, Tom Ward, Yotam Doron, Vlad Firoiu, Tim Harley, Iain Dunning, Shane Legg, and Koray Kavukcuoglu. 2018. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures. arXiv e-prints (Feb. 2018).Google ScholarGoogle Scholar
  19. Greg Faanes, Abdulla Bataineh, Duncan Roweth, Tom Court, Edwin Froese, Bob Alverson, Tim Johnson, Joe Kopnick, Mike Higgins, and James Reinhard. 2012. Cray Cascade: A Scalable HPC System Based on a Dragonfly Network. In Proceedings of ACM/IEEE Conference on Supercomputing. 9.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Yuanxiang Gao, Li Chen, and Baochun Li. 2018. Spotlight: Optimizing Device Placement for Training Deep Neural Networks. In Proceedings of International Conference on Machine Learning, Vol. 80. 1676–1684.Google ScholarGoogle Scholar
  21. S. Gertphol, Yang Yu, A. Alhusaini, and V. K. Prasanna. 2001. An integer programming approach for static mapping of paths onto heterogeneous real-time systems. In IPDPS. 993–1000.Google ScholarGoogle Scholar
  22. Roland Glantz, Henning Meyerhenke, and Alexander Noe. 2014. Algorithms for Mapping Parallel Processes onto Grid and Torus Architectures. (11 2014).Google ScholarGoogle Scholar
  23. Torsten Hoefler and Marc Snir. 2011. Generic Topology Mapping Strategies for Large-Scale Parallel Architectures. In Proceedings of ACM/IEEE Conference on Supercomputing. 75–84.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. K. Huang, X. Zhang, D. Zheng, M. Yu, X. Jiang, X. Yan, L. B. de Brisolara, and A. A. Jerraya. 2019. A Scalable and Adaptable ILP-Based Approach for Task Mapping on MPSoC Considering Load Balance and Communication Optimization. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 38, 9(2019), 1744–1757.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Hrvoje Jasak. 2009. OpenFOAM: Open source CFD in research and industry. International Journal of Naval Architecture and Ocean Engineering (2009), 89 – 94.Google ScholarGoogle Scholar
  26. George Karypis and Vipin Kumar. 1996. Parallel Multilevel K-Way Partitioning Scheme for Irregular Graphs. In Proceedings of ACM/IEEE Conference on Supercomputing. 35–es.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Bob Lantz, Brandon Heller, and Nick McKeown. 2010. A Network in a Laptop: Rapid Prototyping for Software-Defined Networks. Proceedings of the 9th ACM SIGCOMM Workshop on Hot Topics in Networks, 19.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Hongzi Mao, Mohammad Alizadeh, Ishai Menache, and Srikanth Kandula. 2016. Resource Management with Deep Reinforcement Learning. In Proceedings of the 15th ACM Workshop on Hot Topics in Networks. 50–56.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Azalia Mirhoseini, Anna Goldie, Mustafa Yazgan, Joe Jiang, Ebrahim Songhori, Shen Wang, Young-Joon Lee, Eric Johnson, Omkar Pathak, Sungmin Bae, Azade Nazi, Jiwoo Pak, Andy Tong, Kavya Srinivasa, William Hang, Emre Tuncer, Anand Babu, Quoc V. Le, James Laudon, Richard Ho, Roger Carpenter, and Jeff Dean. 2020. Chip Placement with Deep Reinforcement Learning. arXiv e-prints (April 2020).Google ScholarGoogle Scholar
  30. Azalia Mirhoseini, Hieu Pham, Quoc Le, Mohammad Norouzi, Samy Bengio, Benoit Steiner, Yuefeng Zhou, Naveen Kumar, Rasmus Larsen, and Jeff Dean. 2017. Device Placement Optimization with Reinforcement Learning. https://arxiv.org/abs/1706.04972Google ScholarGoogle Scholar
  31. Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Timothy P. Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. 2016. Asynchronous Methods for Deep Reinforcement Learning. arXiv e-prints (Feb. 2016).Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. MohammadReza Nazari, Afshin Oroojlooy, Lawrence Snyder, and Martin Takac. 2018. Reinforcement Learning for Solving the Vehicle Routing Problem. In Proceedings of International Conference on Neural Information Processing Systems. 9839–9849.Google ScholarGoogle Scholar
  33. F. Pellegrini. 1994. Static mapping by dual recursive bipartitioning of process architecture graphs. In Proceedings of IEEE Scalable High Performance Computing Conference. 486–493.Google ScholarGoogle ScholarCross RefCross Ref
  34. François Pellegrini and Jean Roman. 1996. Scotch: A software package for static mapping by dual recursive bipartitioning of process and architecture graphs. In High-Performance Computing and Networking, Heather Liddell, Adrian Colbrook, Bob Hertzberger, and Peter Sloot (Eds.). Springer, 493–498.Google ScholarGoogle Scholar
  35. Steve Plimpton. 1995. Fast Parallel Algorithms for Short-Range Molecular Dynamics. J. Comput. Phys. 117, 1 (1995), 1 – 19.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Peter Sanders and Christian Schulz. 2013. Think Locally, Act Globally: Highly Balanced Graph Partitioning. In Proceedings of International Symposium on Experimental Algorithms, Vol. 7933. Springer, 164–175.Google ScholarGoogle ScholarCross RefCross Ref
  37. Kirk Schloegel, George Karypis, and Vipin Kumar. 2002. Parallel static and dynamic multi-constraint graph partitioning. Concurrency and Computation: Practice and Experience 14 (03 2002), 219–240.Google ScholarGoogle Scholar
  38. John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal Policy Optimization Algorithms. arXiv e-prints (July 2017).Google ScholarGoogle Scholar
  39. D. Tetzlaff and S. Glesner. 2010. Intelligent Task Mapping Using Machine Learning. In 2010 International Conference on Computational Intelligence and Software Engineering. 1–4.Google ScholarGoogle Scholar
  40. Dirk Tetzlaff and Sabine Glesner. 2012. Making MPI Intelligent. Software Engineering (Workshops) P-199, 75 – 88.Google ScholarGoogle Scholar
  41. Brendan Vastenhouw and Rob Bisseling. 2002. A Two-Dimensional Data Distribution Method For Parallel Sparse Matrix-Vector Multiplication. SIAM Rev. 47 (06 2002).Google ScholarGoogle Scholar
  42. Bernd Waschneck, André Reichstaller, Lenz Belzner, Thomas Altenmüller, Thomas Bauernhansl, Alexander Knapp, and Andreas Kyek. 2018. Optimization of global production scheduling with deep reinforcement learning. Procedia CIRP 72 (01 2018), 1264–1269.Google ScholarGoogle Scholar
  43. Christopher John Cornish Hellaby Watkins. 1989. Learning from Delayed Rewards. Ph.D. Dissertation. King’s College, Cambridge, UK. http://www.cs.rhul.ac.uk/~chrisw/new_thesis.pdfGoogle ScholarGoogle Scholar
  44. Ronald J. Williams. 1992. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning. Mach. Learn. 8, 3–4 (May 1992), 229–256.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A Deep Reinforcement Learning Method for Solving Task Mapping Problems with Dynamic Traffic on Parallel Systems
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Other conferences
            HPCAsia '21: The International Conference on High Performance Computing in Asia-Pacific Region
            January 2021
            143 pages
            ISBN:9781450388429
            DOI:10.1145/3432261

            Copyright © 2021 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 20 January 2021

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed limited

            Acceptance Rates

            Overall Acceptance Rate69of143submissions,48%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format .

          View HTML Format