Skip to main content
Log in

Evolutionary multi-level acyclic graph partitioning

  • Published:
Journal of Heuristics Aims and scope Submit manuscript

Abstract

Directed graphs are widely used to model data flow and execution dependencies in streaming applications. This enables the utilization of graph partitioning algorithms for the problem of parallelizing execution on multiprocessor architectures under hardware resource constraints. However due to program memory restrictions in embedded multiprocessor systems, applications need to be divided into parts without cyclic dependencies. We found that this can be done by a subsequent second graph partitioning step with an additional acyclicity constraint. We have four main contributions. First, we show that this more constrained version of the graph partitioning problem is NP-complete and present linear time heuristics. We then integrate them into an existing multi-level graph partitioning framework to better handle large graphs. This achieves a 9% reduction of the edge cut compared to the previous single-level algorithm. Based on this, we engineer an evolutionary algorithm to further reduce the cut, achieving a 30% reduction on average compared to the state of the art. Finally, we integrate the partitioning heuristics into a graph compiler for an embedded multiprocessor architecture and show that this can reduce the amount of communication for a real-world imaging application and thereby accelerate it by an average of 11%. It is shown that the compiler can emit optimized code for vastly different hardware platforms using the heuristics. In addition, we demonstrate how a custom fitness function for the evolutionary algorithm can be used to optimize other objectives like load balancing if the communication volume is not predominantly important on a given hardware platform.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • Abou-Rjeili, A., Karypis, G.: Multilevel algorithms for partitioning power-law graphs. In: Proceedings of 20th International Parallel and Distributed Processing Symposium (2006)

  • Andreev, K., Räcke, H.: Balanced graph partitioning. Theory Comput. Syst. 39(6), 929–939 (2006)

    Article  MathSciNet  Google Scholar 

  • Bäck, T.: Evolutionary Algorithms in Theory and Practice: Evolution Strategies, Evolutionary Programming, Genetic Algorithms. Ph.D. Thesis (1996)

  • Bader, D.A., Meyerhenke, H., Sanders, P., Schulz, C., Kappes, A., Wagner, D.: Benchmarking for graph clustering and partitioning. In: Encyclopedia of Social Network Analysis and Mining (2014)

  • Bichot, C., Siarry, P. (eds.): Graph Partitioning. Wiley, Hoboken (2011)

    MATH  Google Scholar 

  • Buluç, A., Meyerhenke, H., Safro, I., Sanders, P., Schulz, C.: Recent advances in graph partitioning. In: Algorithm Engineering—Selected Topics (2014). arXiv:1311.3144

  • Cardoso, J.M.P., Neto, H.C.: An enhanced static-list scheduling algorithm for temporal partitioning onto RPUs. In: VLSI: Systems on a Chip, pp. 485–496. Springer (2000)

  • Chen, Y., Zhou, H.: Buffer minimization in pipelined SDF scheduling on multi-core platforms. In: Design Automation Conference (ASP-DAC), 2012 17th Asia and South Pacific, pp. 127–132. IEEE (2012)

  • Chevalier, C., Pellegrini, F.: PT-Scotch. Parallel Comput. 34(6–8), 318–331 (2008)

    Article  MathSciNet  Google Scholar 

  • Doerr, B., Fouz, M.: Asymptotically optimal randomized rumor spreading. In: Proceedings of the 38th International Colloquium on Automata, Languages and Programming, Proceedings, Part II, LNCS, vol. 6756, pp. 502–513. Springer (2011)

  • Feitelson, D.G., Rudolph, L.: Gang scheduling performance benefits for fine-grain synchronization. J. Parallel Distrib. Comput. 16(4), 306–318 (1992)

    Article  Google Scholar 

  • Fiduccia, C.M., Mattheyses, R.M.: A linear-time heuristic for improving network partitions. In: Proceedings of the 19th Conference on Design Automation, pp. 175–181 (1982)

  • Gary, M.R., Johnson, D.S.: Computers and intractability: a guide to the theory of NP-completeness (1979)

  • Goossens, J., Richard, P.: Optimal Scheduling of Periodic Gang Tasks. Leibniz Trans. Embed. Syst. 3(1), 04-1 (2016)

    Google Scholar 

  • Herrmann, J., Kho, J., Uçar, B., Kaya, K., Çatalyürek, Ü.V.: Acyclic partitioning of large directed acyclic graphs. In: Proceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp. 371–380. IEEE Press (2017)

  • Jiang, Y.C., Wang, J.F.: Temporal partitioning data flow graphs for dynamically reconfigurable computing. IEEE Trans. Very Large Scale Integr. VLSI Syst. 15(12), 1351–1361 (2007)

    Article  Google Scholar 

  • Kahn, A.B.: Topological sorting of large networks. Commun. ACM 5(11), 558–562 (1962)

    Article  Google Scholar 

  • Kao, C.C.: Performance-oriented partitioning for task scheduling of parallel reconfigurable architectures. IEEE Trans. Parallel Distrib. Syst. 26(3), 858–867 (2015)

    Article  Google Scholar 

  • Karypis, G., Kumar, V.: A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput. 20(1), 359–392 (1998)

    Article  MathSciNet  Google Scholar 

  • Khronos Group: The OpenVX specification: vision functions. https://www.khronos.org/registry/OpenVX/specs/1.0/html/da/db6/group__group__vision__functions.html (2017)

  • Kim, J., Hwang, I., Kim, Y.H., Moon, B.R.: Genetic approaches for graph partitioning: a survey. In: Proceedings of the 13th Annual Genetic and Evolutionary Computation Conference (GECCO’11), pp. 473–480. ACM (2011)

  • Meyerhenke, H., Monien, B., Schamberger, S.: Accelerating shape optimizing load balancing for parallel FEM simulations by algebraic multigrid. In: Proceedings of 20th International Parallel and Distributed Processing Symposium (2006)

  • Meyerhenke, H., Sanders, P., Schulz, C.: Partitioning complex networks via size-constrained clustering. In: Proceedings of the 13th International Symposium on Experimental Algorithms, LNCS. Springer (2014)

  • Miller, B.L., Goldberg, D.E.: Genetic algorithms, tournament selection, and the effects of noise. Evol. Comput. 4(2), 113–131 (1996)

    Article  Google Scholar 

  • Paris, S., Hasinoff, S.W., Kautz, J.: Local Laplacian filters: edge-aware image processing with a Laplacian pyramid. ACM Trans. Graph. 30(4), 68 (2011)

    Article  Google Scholar 

  • Pellegrini, F.: Scotch and PT-scotch graph partitioning software: an overview. In: Combinatorial Scientific Computing, pp. 373–406 (2012)

  • Picard, J.C., Queyranne, M.: On the structure of all minimum cuts in a network and applications. Math. Program. Stud. 13, 8–16 (1980)

    Article  MathSciNet  Google Scholar 

  • Pouchet, L.: Polybench: the polyhedral benchmark suite. http://www.cs.ucla.edu/pouchet/software/polybench (2012)

  • Sanders, P., Schulz, C.: Engineering multilevel graph partitioning algorithms. In: Proceedings of the 19th European Symposium on Algorithms, LNCS, vol. 6942, pp. 469–480. Springer (2011)

  • Schloegel, K., Karypis, G., Kumar, V.: Graph partitioning for high performance scientific simulations. In: The Sourcebook of Parallel Computing, pp. 491–541 (2003)

  • Southwell, R.V.: Stress-calculation in frameworks by the method of “systematic relaxation of constraints”. Proc. R. Soc. Lond. 151(872), 56–95 (1935)

    Article  Google Scholar 

  • Stavrinides, G.L., Karatza, H.D.: Scheduling different types of applications in a SaaS Cloud. In: Proceedings of the 6th International Symposium on Business Modeling and Software Design (BMSD’16), pp. 144–151 (2016)

  • Walshaw, C., Cross, M.: Mesh partitioning: a multilevel balancing and refinement algorithm. SIAM J. Sci. Comput. 22(1), 63–80 (2000)

    Article  MathSciNet  Google Scholar 

  • Walshaw, C., Cross, M.: JOSTLE: parallel multilevel graph-partitioning software—an overview. In: Mesh Partitioning Techniques and Domain Decomposition Techniques, pp. 27–58 (2007)

  • Wolf, M.: Platforms and architectures for distributed smart cameras. In: Distributed Embedded Smart Cameras, pp. 3–23. Springer (2014)

  • Wolf, M.: Embedded computer vision. In: Handbook of Hardware/Software Codesign, pp. 1–14 (2017)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Merten Popp.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Moreira, O., Popp, M. & Schulz, C. Evolutionary multi-level acyclic graph partitioning. J Heuristics 26, 771–799 (2020). https://doi.org/10.1007/s10732-020-09448-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10732-020-09448-8

Keywords

Navigation