Abstract
Directed graphs are widely used to model data flow and execution dependencies in streaming applications. This enables the utilization of graph partitioning algorithms for the problem of parallelizing execution on multiprocessor architectures under hardware resource constraints. However due to program memory restrictions in embedded multiprocessor systems, applications need to be divided into parts without cyclic dependencies. We found that this can be done by a subsequent second graph partitioning step with an additional acyclicity constraint. We have four main contributions. First, we show that this more constrained version of the graph partitioning problem is NP-complete and present linear time heuristics. We then integrate them into an existing multi-level graph partitioning framework to better handle large graphs. This achieves a 9% reduction of the edge cut compared to the previous single-level algorithm. Based on this, we engineer an evolutionary algorithm to further reduce the cut, achieving a 30% reduction on average compared to the state of the art. Finally, we integrate the partitioning heuristics into a graph compiler for an embedded multiprocessor architecture and show that this can reduce the amount of communication for a real-world imaging application and thereby accelerate it by an average of 11%. It is shown that the compiler can emit optimized code for vastly different hardware platforms using the heuristics. In addition, we demonstrate how a custom fitness function for the evolutionary algorithm can be used to optimize other objectives like load balancing if the communication volume is not predominantly important on a given hardware platform.
Similar content being viewed by others
References
Abou-Rjeili, A., Karypis, G.: Multilevel algorithms for partitioning power-law graphs. In: Proceedings of 20th International Parallel and Distributed Processing Symposium (2006)
Andreev, K., Räcke, H.: Balanced graph partitioning. Theory Comput. Syst. 39(6), 929–939 (2006)
Bäck, T.: Evolutionary Algorithms in Theory and Practice: Evolution Strategies, Evolutionary Programming, Genetic Algorithms. Ph.D. Thesis (1996)
Bader, D.A., Meyerhenke, H., Sanders, P., Schulz, C., Kappes, A., Wagner, D.: Benchmarking for graph clustering and partitioning. In: Encyclopedia of Social Network Analysis and Mining (2014)
Bichot, C., Siarry, P. (eds.): Graph Partitioning. Wiley, Hoboken (2011)
Buluç, A., Meyerhenke, H., Safro, I., Sanders, P., Schulz, C.: Recent advances in graph partitioning. In: Algorithm Engineering—Selected Topics (2014). arXiv:1311.3144
Cardoso, J.M.P., Neto, H.C.: An enhanced static-list scheduling algorithm for temporal partitioning onto RPUs. In: VLSI: Systems on a Chip, pp. 485–496. Springer (2000)
Chen, Y., Zhou, H.: Buffer minimization in pipelined SDF scheduling on multi-core platforms. In: Design Automation Conference (ASP-DAC), 2012 17th Asia and South Pacific, pp. 127–132. IEEE (2012)
Chevalier, C., Pellegrini, F.: PT-Scotch. Parallel Comput. 34(6–8), 318–331 (2008)
Doerr, B., Fouz, M.: Asymptotically optimal randomized rumor spreading. In: Proceedings of the 38th International Colloquium on Automata, Languages and Programming, Proceedings, Part II, LNCS, vol. 6756, pp. 502–513. Springer (2011)
Feitelson, D.G., Rudolph, L.: Gang scheduling performance benefits for fine-grain synchronization. J. Parallel Distrib. Comput. 16(4), 306–318 (1992)
Fiduccia, C.M., Mattheyses, R.M.: A linear-time heuristic for improving network partitions. In: Proceedings of the 19th Conference on Design Automation, pp. 175–181 (1982)
Gary, M.R., Johnson, D.S.: Computers and intractability: a guide to the theory of NP-completeness (1979)
Goossens, J., Richard, P.: Optimal Scheduling of Periodic Gang Tasks. Leibniz Trans. Embed. Syst. 3(1), 04-1 (2016)
Herrmann, J., Kho, J., Uçar, B., Kaya, K., Çatalyürek, Ü.V.: Acyclic partitioning of large directed acyclic graphs. In: Proceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp. 371–380. IEEE Press (2017)
Jiang, Y.C., Wang, J.F.: Temporal partitioning data flow graphs for dynamically reconfigurable computing. IEEE Trans. Very Large Scale Integr. VLSI Syst. 15(12), 1351–1361 (2007)
Kahn, A.B.: Topological sorting of large networks. Commun. ACM 5(11), 558–562 (1962)
Kao, C.C.: Performance-oriented partitioning for task scheduling of parallel reconfigurable architectures. IEEE Trans. Parallel Distrib. Syst. 26(3), 858–867 (2015)
Karypis, G., Kumar, V.: A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput. 20(1), 359–392 (1998)
Khronos Group: The OpenVX specification: vision functions. https://www.khronos.org/registry/OpenVX/specs/1.0/html/da/db6/group__group__vision__functions.html (2017)
Kim, J., Hwang, I., Kim, Y.H., Moon, B.R.: Genetic approaches for graph partitioning: a survey. In: Proceedings of the 13th Annual Genetic and Evolutionary Computation Conference (GECCO’11), pp. 473–480. ACM (2011)
Meyerhenke, H., Monien, B., Schamberger, S.: Accelerating shape optimizing load balancing for parallel FEM simulations by algebraic multigrid. In: Proceedings of 20th International Parallel and Distributed Processing Symposium (2006)
Meyerhenke, H., Sanders, P., Schulz, C.: Partitioning complex networks via size-constrained clustering. In: Proceedings of the 13th International Symposium on Experimental Algorithms, LNCS. Springer (2014)
Miller, B.L., Goldberg, D.E.: Genetic algorithms, tournament selection, and the effects of noise. Evol. Comput. 4(2), 113–131 (1996)
Paris, S., Hasinoff, S.W., Kautz, J.: Local Laplacian filters: edge-aware image processing with a Laplacian pyramid. ACM Trans. Graph. 30(4), 68 (2011)
Pellegrini, F.: Scotch and PT-scotch graph partitioning software: an overview. In: Combinatorial Scientific Computing, pp. 373–406 (2012)
Picard, J.C., Queyranne, M.: On the structure of all minimum cuts in a network and applications. Math. Program. Stud. 13, 8–16 (1980)
Pouchet, L.: Polybench: the polyhedral benchmark suite. http://www.cs.ucla.edu/pouchet/software/polybench (2012)
Sanders, P., Schulz, C.: Engineering multilevel graph partitioning algorithms. In: Proceedings of the 19th European Symposium on Algorithms, LNCS, vol. 6942, pp. 469–480. Springer (2011)
Schloegel, K., Karypis, G., Kumar, V.: Graph partitioning for high performance scientific simulations. In: The Sourcebook of Parallel Computing, pp. 491–541 (2003)
Southwell, R.V.: Stress-calculation in frameworks by the method of “systematic relaxation of constraints”. Proc. R. Soc. Lond. 151(872), 56–95 (1935)
Stavrinides, G.L., Karatza, H.D.: Scheduling different types of applications in a SaaS Cloud. In: Proceedings of the 6th International Symposium on Business Modeling and Software Design (BMSD’16), pp. 144–151 (2016)
Walshaw, C., Cross, M.: Mesh partitioning: a multilevel balancing and refinement algorithm. SIAM J. Sci. Comput. 22(1), 63–80 (2000)
Walshaw, C., Cross, M.: JOSTLE: parallel multilevel graph-partitioning software—an overview. In: Mesh Partitioning Techniques and Domain Decomposition Techniques, pp. 27–58 (2007)
Wolf, M.: Platforms and architectures for distributed smart cameras. In: Distributed Embedded Smart Cameras, pp. 3–23. Springer (2014)
Wolf, M.: Embedded computer vision. In: Handbook of Hardware/Software Codesign, pp. 1–14 (2017)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Moreira, O., Popp, M. & Schulz, C. Evolutionary multi-level acyclic graph partitioning. J Heuristics 26, 771–799 (2020). https://doi.org/10.1007/s10732-020-09448-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10732-020-09448-8