Skip to main content
Log in

Algorithmic aspects of graph reduction for hardware/software partitioning

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

The hardware/software (HW/SW) partitioning is a major concern in heterogeneous multi-processor system-on-a-chip design, where the large design space prohibits rapid identification of optimal HW/SW solutions to meet tight time-to-market constraints. In this paper, we propose graph reduction techniques to reduce the design space for HW/SW partitioning without sacrificing the partition quality. There are two major phases in the proposed approach: reducible sub-graph searching and sub-graph evaluation and reduction. In the former phase, we design a dynamic programming-based algorithm, named path flow algorithm (PFA), to identify reducible sub-graph candidates for directed acyclic graph (DAG) as most previous works use DAG as task graph model. We also propose algorithm DeLoop to transform an arbitrary directed graph into a DAG such that all reducible sub-graphs on the original graph can be detected by performing algorithm PFA on the DAG. Our approach overcomes the limitation of the existing approach by enabling the identification of candidate sub-graphs in arbitrary task graphs. In latter phase, we propose a reduction model which enables accurate estimation of task execution time on hardware and design a method to select candidate sub-graphs for reduction. Experimental results demonstrate that the proposed methods not only reduce the design space, but also notably improve the partitioning quality since hardware-parallel execution of tasks is taken into account in the proposed sub-graph reduction model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  1. Arabnia HR, Oliver MA (1989) A transputer network for fast operations on digitised images. Comput Gr Forum 8(1):3–11

  2. Arabnia HR (1990) A parallel algorithm for the arbitrary rotation of digitized images using process-and-data-decomposition approach. J Parallel Distrib Comput. 10(2):188–192

    Article  Google Scholar 

  3. Bhandarkar SM, Arabnia HR (1995) The REFINE multiprocessor theoretical properties and algorithms. Parallel Comput 21(11):1783–1805

    Article  Google Scholar 

  4. Arabnia HR, Smith JW (1993) A reconfigurable interconnection network for imaging operations and its implementation using a multi-stage switching box. In: Proceedings of the 7th annual international high performance computing conference, pp 349–357

  5. Arató P, Mann ZÁ, Orbán A (2005) Algorithmic aspects of hardware/software partitioning. ACM Trans Des Autom Electron Syst (TODAES) 10(1):136–156

  6. Tahaee SA, Jahangir AH (2010) A polynomial algorithm for partitioning problems. ACM Trans Embed Comput Syst (TECS) 9(4):1–38. (Article 34)

  7. Vahid F, Gajski DD, Gong J (1994) A binary-constraint search algorithm for minimizing hardware during hardware/software partitioning. In: Proceedings of the conference on the European design automation (EURO-DAC), pp 214–219

  8. Dick RP, Jha NK (1997) Mogac: a multiobjective genetic algorithm for the co-synthesis of hardware–software embedded systems, In: Proceedings of 1997 IEEE/ACM international conference on computer-aided design (ICCAD), pp 522–529

  9. López-Vallejo M, López JC (2003) On the hardware–software partitioning problem: system modeling and partitioning techniques. ACM Trans Des Autom Electron Syst (TODAES) 8(3):269–297

    Article  Google Scholar 

  10. Vahid F (2002) Partitioning sequential programs for cad using a three-step approach. ACM Trans Des Autom Electron Syst (TODAES) 7(3):413–429

    Article  Google Scholar 

  11. Wu J, Srikanthan T, Tao J (2008) Algorithmic aspects for functional partitioning and scheduling in hardware/software co-design. Des Autom Embed Syst 12(4):345–375

    Article  Google Scholar 

  12. Henkel J (1999) A low power hardware/software partitioning approach for core-based embedded systems. In: Proceedings of the 36th ACM/IEEE design automation conference (DAC), pp 122–127

  13. Stitt G, Vahid F (2002) The energy advantages of microprocessor platforms with on-chip configurable logic. IEEE Des Test Comput 19(6):36–43

    Article  Google Scholar 

  14. Mu J, Lysecky R (2009) Autonomous hardware/software partitioning and voltage/frequency scaling for low-power embedded systems. ACM Trans Des Autom Electron Syst (TODAES) 15(1):1–20

    Article  Google Scholar 

  15. Lysecky R (2007) Low-power warp processor for power efficient high-performance embedded systems. In: Proceedings of the design, automation and test in Europe conference and exhibition (DATE), pp 1–6

  16. Banerjee S, Dutt N (2004) Efficient search space exploration for HW–SW partitioning. In: Proceedings of the IEEE/ACM/IFIP international conference on hardware/software codesign and system synthesis (CODES+ISSS), pp 122–127

  17. Wu J, Srikanthan T, Chen G (2010) Algorithmic aspects of hardware/software partitioning: 1D search algorithms. IEEE Trans Comput 59(4):532–544

    Article  MathSciNet  Google Scholar 

  18. Tahaee SA, Jahangir AH, Habibi-Masouleh H (2008) Improving the performance of heuristic searches with judicious initial point selection. In: Proceedings of the 5th IEEE international symposium on embedded computing, pp 14–19

  19. Wu J, Srikanthan T (2006) Low-complex dynamic programming algorithm for hardware/software partitioning. Inf Process Lett 98(2):41–46

    Article  MATH  Google Scholar 

  20. Niemann R, Marwedel P (1997) An algorithm for hardware/software partitioning using mixed integer linear programming. Des Autom Embed Syst 2(2):165–193

    Article  Google Scholar 

  21. Wu J, Wang P, Lam SK, Srikanthan T (2013) Efficient heuristic and tabu search for hardware/software partitioning. J Supercomput 66(1):118–134

    Article  Google Scholar 

  22. Monshizadeh N, Trentelman HL, Camlibel MK (2014) Projection based model reduction of multi-agent systems using graph partitions. IEEE Trans Control Netw Syst 1(2):145–154

    Article  MathSciNet  Google Scholar 

  23. Shen C, Tong W (2013) A novel method of adopting graph reduction for resource management in parallel computing model. In: Proceedings of the 2013 IEEE international conference on green computing and communications and ieee internet of things and IEEE cyber, physical and social computing, pp 2218–2220

  24. Song Y, Shi G (2012) Hierarchical graph reduction approach to symbolic circuit analysis with data sharing and cancellation-free properties. In: Proceedings of the 2012 17th Asia and South Pacific design automation conference (ASP-DAC), pp 541–546

  25. Xu Y, Chu C (2009) GREMA: graph reduction based efficient mask assignment for double patterning technology. In: Proceedings of the IEEE/ACM international conference on computer-aided design (ICCAD), pp 601–606

  26. Henke J, Ernst R (1995) A path-based technique for estimating hardware runtime in Hw/Sw-cosynthesis. In: Proceedings of the 8th international symposium on system synthesis, pp 116–121

  27. Maciel P, Barros E, Rosenstiel W (1997) Computing communication cost by Petri nets for hardware/software codesign. In: Proceedings of the 8th international workshop on rapid system prototyping (RSP’97) shortening the path from specification to prototype, pp 44–56

  28. Henkel J, Ernst R (1998) High-level estimation techniques for usage in hardware/software co-design. In: Proceedings of the Asia and South Pacific design automation conference (ASP-DAC), pp 353–360

  29. Li Y, Henkel J (1998) A framework for estimating and minimizing energy dissipation of embedded HW/SW systems. In: Proceedings of the 35th design automation conference (DAC), pp 188–193

  30. Junior M, Neto S, Maciel P, Lima R, Ribeiro A, Barreto R, Tavares E, Braga F (2006) Analyzing software performance and energy consumption of embedded systems by probabilistic modeling: an approach based on coloured Petri nets. In: Proceedings of the 27th international conference on applications and theory of Petri nets and other models of concurrency (ICATPN), Turku, pp.261–281 (pp 26–30)

  31. Ye W, Ernst R, Benner T, Henkel J (1993) Fast timing analysis for hardware–software co-synthesis. In: Proceedings of the IEEE international conference on computer design: VLSI in computers and processors (ICCD), pp 452–457

  32. Vahid F, Gajski DD (1995) Closeness metrics for system-level functional partitioning. In: Proceedings of the European design automation conference with EURO-VHDL (EURO-DAC), pp 328–333

  33. Lam SK, Srikanthan T (2009) Rapid design of area-efficient custom instructions for reconfigurable embedded processing. J Syst Archit 55(1):1–14

    Article  Google Scholar 

  34. Lam SK, Srikanthan T, Clarke CT (2011) Architecture-aware technique for mapping area-time efficient custom instructions onto FPGAs. IEEE Trans Comput 60(5):680–692

    Article  MathSciNet  Google Scholar 

  35. Chuong LM, Lam SK, Srikanthan T (2009) Area-time estimation of controller for porting C-based functions onto FPGA. In: Proceedings of the IEEE/IFIP international symposium on rapid system prototyping (RSP), pp 145–151

  36. Zhao K, Bian J (2011) Instruction-level hardware/software partition through DFG exploration. In: Proceedings of the 15th international conference on computer supported cooperative work in design (CSCWD), pp 55–60

  37. Wang D, Li S, Dou Y (2008) Collaborative hardware/software partition of coarse-grained reconfigurable system using evolutionary ant colony optimization. In: Proceedings of the 2008 Asia and South Pacific design automation conference (ASP-DAC), pp 679–684

  38. Bouchhima A, Gerin P, Petrot F (2009) Automatic instrumentation of embedded software for high level hardware/software co-simulation. In: Proceedings of the 2009 Asia and South Pacific design automation conference (ASP-DAC), pp 546–551

  39. Wolf W (1996) Object-oriented cosynthesis of distributed embedded systems. ACM Trans Des Autom Electron Syst (TODAES) 1(3):301–314

    Article  Google Scholar 

  40. Henkel J, Ernst R (2001) An approach to automated hardware/software partitioning using a flexible granularity that is driven by high-level estimation techniques. IEEE Trans Very Large Scale Integr (VLSI) Syst 9(2):273–289

  41. Gibson D, Kumar R, McCurley KS, Tomkins A (2006) Dense subgraph extraction. In: Cook DJ, Holder LB (eds) Mining graph data. Wiley, Hoboken

    Google Scholar 

  42. Cuzzocrea A, Song IY (2014) Big graph analytics: the state of the art and future research agenda. In: Proceedings of 17th international workshop on data warehousing and OLAP. ACM, pp 99–101

  43. Quamar A, Deshpande A, Lin J (2014) NScale: neighborhood-centric analytics on large graphs. Proc VLDB 7(13):1673–1676

    Article  Google Scholar 

  44. Huan J, Wang W, Prins J, Yang J (2004) Spin: mining maximal frequent subgraphs from graph databases. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 581–586

  45. Youness H, Hassan M, Sakanushi K, Takeuchi Y, Imai M, Salem A, Wahdan AM, Moness M (2009) A high performance algorithm for scheduling and hardware–software partitioning on MPSoCs. In: Proceedings of the 4th international conference on design and technology of integrated systems in nanoscale era (DTIS’09), pp 71–76

  46. Han H, Liu W, Wu J, Jiang G (2013) Efficient algorithm for hardware/software partitioning and scheduling on MPSoC. J Comput 8(1):61–68

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported by the National Natural Science Foundation of China under Grant No. 61173032, and the Doctoral Fund of Ministry of Education of China under Grant No. 20131201110002.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jigang Wu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jiang, G., Wu, J., Lam, SK. et al. Algorithmic aspects of graph reduction for hardware/software partitioning. J Supercomput 71, 2251–2274 (2015). https://doi.org/10.1007/s11227-015-1381-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-015-1381-4

Keywords

Navigation