Skip to main content
Log in

Optimal mapping of program overlays onto many-core platforms with limited memory capacity

  • Published:
Design Automation for Embedded Systems Aims and scope Submit manuscript

Abstract

This paper addresses the problem of mapping tasks onto an FPGA-based many-core platform where the cores typically have a limited amount of memory and thus should be frequently overlaid with a small program block that implements a task. In this regard, we propose a framework that takes integer linear programming (ILP) to find an optimal mapping of an application onto such a many-core platform at the task-level of granularity. The optimality is defined within the limits of our ILP model. The proposed framework is not only suitable for an application that can be accommodated on the available cores but also for a larger application (or even multiple applications) that needs more cores than what is provided by the platform. This is achieved by mapping different partitions of the application to the same set of cores and dynamically (during the life time of the application) overlaying a partition on another. The proposed mapping flow integrates scheduling, binding and place and route steps into one mapping process using an ILP formulation. Due to the slowness of ILP solutions, our solution is applicable at design time only. It is implemented using TOMLAB/CPLEX toolbox and we assess its efficacy on a set of 40 synthetic task graphs as well as some multimedia applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Milford M, McAllister J (2009) An ultra-fine processor for fpga dsp chip multiprocessors. In: 2009 Conference record of the forty-third Asilomar conference on signals, systems and computers, pp 226–230

  2. LaForest CE, Steffan JG (2012) Octavo: An FPGA-centric processor family. In: Proceedings of the ACM/SIGDA international symposium on field programmable gate arrays, FPGA ’12, New York, NY, USA, ACM, pp 219–228

  3. Lebedev I, Cheng S, Doupnik A, Martin J, Fletcher C, Burke D, Lin M, Wawrzynek J (2010) Marc: a many-core approach to reconfigurable computing. In: 2010 International conference on reconfigurable computing and FPGAs, pp 7–12

  4. Capalija D, and Abdelrahman TS (2011) Towards synthesis-free jit compilation to commodity FPGAs. In: 2011 IEEE 19th Annual International Symposium on field-programmable custom computing machines (FCCM), pp 202–205

  5. Cheah HY, Brosser F, Fahmy SA, Maskell LD (2014) The idea DSP block-based soft processor for FPGAs. ACM Trans Reconfig Technol Syst 7(3):19

    Article  Google Scholar 

  6. Bergmann Neil W, Shukla Sunil K, Becker Jürgen (2013) QUKU: a dual-layer reconfigurable architecture. ACM Trans Embed Comput Syst 12:63

    Google Scholar 

  7. Paul L, Dash C, Moghaddam MS (2012) Remorph: a runtime reconfigurable architecture. In: 15th Euromicro Conference on digital system design (DSD), pp 26–33

  8. Raza MA, Azeemuddin S (2014) Multiprocessing on FPGA using light weight processor. In: 2014 IEEE International conference on electronics, computing and communication technologies (IEEE CONECCT), pp 1–6

  9. Jain AK, Li X, Fahmy SA, Maskell DL (2016) Adapting the dyser architecture with dsp blocks as an overlay for the xilinx zynq. SIGARCH Comput Archit News 43(4):28–33

    Article  Google Scholar 

  10. Bohnenstiehl B, Stillmaker A, Pimentel J, Andreas T, Liu B, Tran A, Adeagbo E, Baas B (2016) Kilocore: a 32 nm 1000-processor array. In: IEEE HotChips symposium on high-performance chips

  11. Marwedel P, Teich J, Kouveli G, Bacivarov I, Thiele L, Ha S, Lee C, Xu Q, Huang L (2011) Mapping of applications to mpsocs. In: Proceedings of the seventh IEEE/ACM/IFIP international conference on hardware/software codesign and system synthesis, CODES+ISSS ’11, New York, NY, USA. ACM, pp 109–118

  12. Sahu PK, Chattopadhyay S (2013) A survey on application mapping strategies for network-on-chip design. J Syst Archit 59(1):60–76

    Article  Google Scholar 

  13. Yang Bo (2013) Towards optimal application mapping for energy-efficient many-core platforms. Turku Centre for Computer Science, University of Turku, Turku

    Google Scholar 

  14. Singh AK, Shafique M, Kumar A, Henkel J (2013) Mapping on multi/many-core systems: survey of current and emerging trends. In: Design automation conference (DAC), 2013 50th ACM/EDAC/IEEE, pp 1–10

  15. Paone E, Robino F, Palermo G, Zaccaria V, Sander I, Silvano C (2015) Customization of opencl applications for efficient task mapping under heterogeneous platform constraints. In: 2015 design, automation test in Europe conference exhibition (DATE), pp 736–741

  16. Roy A, Manna K, Chattapadhay S (2015) Effect of core ordering on application mapping onto mesh based network-on-chip design. In: 2nd International conference on computing for sustainable global development (INDIACom), pp 363–369

  17. Wang C, Miao L, Xie B, Chen T (2009) An application mapping scheme over distributed reconfigurable system. In: 15th International conference on parallel and distributed systems (ICPADS), pp 535–542

  18. Kinsy MA, Devadas S (2014) Algorithms for scheduling task-based applications onto heterogeneous many-core architectures. In: High performance extreme computing conference (HPEC). IEEE, pp 1–6

  19. Nikolić Borislav, Petters Stefan M (2015) Real-time application mapping for many-cores using a limited migrative model. Real Time Syst 51(3):314–357

    Article  MATH  Google Scholar 

  20. Moghaddam MS, Balakrishnan M, Paul K (2015) Applied reconfigurable computing: 11th international symposium, ARC 2015, Bochum, Germany, Apr 13–17, 2015, Proceedings, chapter partial reconfiguration for dynamic mapping of task graphs onto 2D mesh platform. Springer, Cham, pp 373–382

  21. Lee Ganghee, Choi Kiyoung, Dutt ND (2011) Mapping multi-domain applications onto coarse-grained reconfigurable architectures. IEEE Trans Comput Aid Des Integr Circuits Syst 30(5):637–650

    Article  Google Scholar 

  22. Lei T, Kumar S (2003) A two-step genetic algorithm for mapping task graphs to a network on chip architecture. In: Euromicro symposium on digital system design, Proceedings, pp 180–187

  23. Rhee C-H, Jeong H-Y, Ha S (2004) Many-to-many core-switch mapping in 2-d mesh noc architectures. In: IEEE International conference on computer design: VLSI in computers and processors, ICCD 2004, Proceedings, pp 438–443

  24. Zimmer C, Mueller F (2012) Low contention mapping of real-time tasks onto tilepro 64 core processors. In: Proceedings of the 2012 IEEE 18th real time and embedded technology and applications symposium, RTAS ’12, Washington, DC, USA. IEEE Computer Society, pp 131–140

  25. Michael N, Wang Y, Suh GE, Tang A (2013) Quadrisection-based task mapping on many-core processors for energy-efficient on-chip communication. In: Seventh IEEE/ACM international symposium on networks on chip (NoCS), pp 1–2

  26. Das R, Ausavarungnirun R, Mutlu O, Kumar A, Azimi M (2013) Application-to-core mapping policies to reduce memory system interference in multi-core systems. In: IEEE 19th International symposium on high performance computer architecture (HPCA2013), pp 107–118

  27. Fattah M, Rahmani AM, Xu TC, Kanduri A, Liljeberg P, Plosila J, Tenhunen H (2014) Mixed-criticality run-time task mapping for noc-based many-core systems. In: 2014 22nd Euromicro international conference on parallel, distributed, and network-based processing, pp 458–465

  28. Zhu D, Chen L, Pinkston TM, Pedram M (2015) Tapp: temperature-aware application mapping for noc-based many-core processors. In: 2015 Design, automation test in Europe conference exhibition (DATE), pp 1241–1244

  29. Cao S, Salcic Z, Li Z, Wei S, Ding Y (2016) Temperature-aware multi-application mapping on network-on-chip based many-core systems. Microprocess Microsyst 46(B):149–160

    Article  Google Scholar 

  30. Ramaswamy S (1997) Matrix representation of graphs. http://cpsc.ualr.edu/srini/DM/chapters/review5.3.html. Accessed 27 March 2016

  31. Goran AO, Holmstrom K, Edval MM (2014) User’s guide for TOMLAB /CPLEX. Accessed 23 Jan 2017

  32. Xilinx University Program XUPV5-LX110T. www.xilinx.com (2016)

  33. Ahn M, Yoon JW, Paek Y, Kim Y, Kiemb M, Choi K (2006) A spatial mapping algorithm for heterogeneous coarse-grained reconfigurable architectures. In: Proceedings of the conference on design, automation and test in Europe: proceedings, DATE ’06, 3001 Leuven, Belgium, Belgium. European Design and Automation Association, pp 363–368

  34. Yoon JW, Shrivastava A, Park S, Ahn M, Jeyapaul R, Paek Y (2008) SPKM : a novel graph drawing based algorithm for application mapping onto coarse-grained reconfigurable architectures. In: Design automation conference, ASPDAC 2008, Asia and South Pacific, pp 776–782

  35. Yoon JW, Shrivastava A, Park S, Ahn M, Paek Y (2009) A graph drawing based spatial mapping algorithm for coarse-grained reconfigurable architectures. IEEE Trans Very Large Scale Integr Syst 17(11):1565–1578

    Article  Google Scholar 

  36. Jing N, He W, Mao Z (2010) Resource constrained mapping of data flow graphs onto coarse-grained reconfigurable array. In: 23rd IEEE international SOC conference, pp 260–265

  37. Srinivasan K, Chatha KS, Konjevod G (2004) Linear programming based techniques for synthesis of network-on-chip architectures. In: IEEE International conference on computer design: VLSI in computers and processors, ICCD 2004, Proceedings, pp 422–429

  38. Ozturk O, Kandemir M, Son SW (2007) An ilp based approach to reducing energy consumption in noc based cmps. In: ACM/IEEE International symposium on low power electronics and design (ISLPED), pp 411–414

  39. Chou C-L, Marculescu R (2008) Contention-aware application mapping for network-on-chip communication architectures. In: IEEE International conference on computer design, ICCD 2008, pp 164–169

  40. Ghosh P, Sen A, Hall A (2009) Energy efficient application mapping to noc processing elements operating at multiple voltage levels. In: 3rd ACM/IEEE International symposium on networks-on-chip, NoCS 2009, pp 80–85

  41. Huang J, Buckl C, Raabe A, Knoll A (2011) Energy-aware task allocation for network-on-chip based heterogeneous multiprocessor systems. In: Proceedings of the 2011 19th international euromicro conference on parallel, distributed and network-based processing, PDP ’11, Washington, DC, USA. IEEE Computer Society, pp 447–454

  42. Tosun S, Ozturk O, Ozen M (2009) An ilp formulation for application mapping onto network-on-chips. In: International conference on application of information and communication technologies, AICT 2009, pp 1–5

  43. Tosun Suleyman (2011) Cluster-based application mapping method for network-on-chip. Adv Eng Softw 42(10):868–874

    Article  Google Scholar 

  44. Jang W, Pan DZ (2010) A3map: Architecture-aware analytic mapping for networks-on-chip. In: Design automation conference (ASP-DAC), 15th Asia and South Pacific, pp 523–528

  45. Soumya J, Sharma A, Chattopadhyay S (2014) A locally reconfigurable network-on-chip architecture and application mapping onto it. In: 18th International symposium on VLSI design and test, pp 1–6

  46. Soumya J, Sharma A, Chattopadhyay S (2014) Multi-application network-on-chip design using global mapping and local reconfiguration. ACM Trans Reconfig Technol Syst 7(2):1–24

    Google Scholar 

  47. Bender A (1996) Milp based task mapping for heterogeneous multiprocessor systems. In: Proceedings of the conference on european design automation, EURO-DAC ’96/EURO-VHDL ’96, Los Alamitos, CA, USA. IEEE Computer Society Press, pp 190–197

  48. Ostler C, Chatha KS (2007) An ilp formulation for system-level application mapping on network processor architectures. In: Design, automation test in Europe conference exhibition, 2007, DATE ’07, pp 99–104

  49. Murali S, Benini L, De Micheli G (2005) Mapping and physical planning of networks-on-chip architectures with quality-of-service guarantees. In: Design automation conference, 2005, Proceedings of the ASP-DAC 2005, Asia and South Pacific, vol 1, pp 27–32

  50. Kwok Yu-Kwong, Ahmad Ishfaq (1999) Static scheduling algorithms for allocating directed task graphs to multiprocessors. ACM Comput Surv 31(4):406–471

    Article  Google Scholar 

  51. Stuijk S, Basten T, Geilen MCW, Corporaal H (2007) Multiprocessor resource allocation for throughput-constrained synchronous dataflow graphs. In: Proceedings of the 44th annual design automation conference, DAC ’07, New York, NY, USA. ACM, pp 777–782

  52. Kumar A, Fernando S, Ha Y, Mesman B, Corporaal H (2008) Multiprocessor systems synthesis for multiple use-cases of multiple applications on FPGA. ACM Trans Des Autom Electron Syst 13(3):92–97

    Article  Google Scholar 

  53. Erbas C, Cerav-Erbas S, Pimentel AD (2006) Multiobjective optimization and evolutionary algorithms for the application mapping problem in multiprocessor system-on-chip design. Trans Evol Comput 10(3):358–374

    Article  Google Scholar 

  54. Castrillon J, Leupers R, Ascheid G (2013) Maps: mapping concurrent dataflow applications to heterogeneous mpsocs. IEEE Trans Ind Inform 9(1):527–545

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to thank Marcus M. Edvall and Anders Goran for providing specifications and permission to use TOMLAB/CPLEX. Their great help made it easy to develop our mapping flow.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kiyoung Choi.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shahraki Moghaddam, M., Balakrishnan, M. & Choi, K. Optimal mapping of program overlays onto many-core platforms with limited memory capacity. Des Autom Embed Syst 21, 173–194 (2017). https://doi.org/10.1007/s10617-017-9193-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10617-017-9193-9

Keywords

Navigation