Abstract
This paper addresses the problem of mapping tasks onto an FPGA-based many-core platform where the cores typically have a limited amount of memory and thus should be frequently overlaid with a small program block that implements a task. In this regard, we propose a framework that takes integer linear programming (ILP) to find an optimal mapping of an application onto such a many-core platform at the task-level of granularity. The optimality is defined within the limits of our ILP model. The proposed framework is not only suitable for an application that can be accommodated on the available cores but also for a larger application (or even multiple applications) that needs more cores than what is provided by the platform. This is achieved by mapping different partitions of the application to the same set of cores and dynamically (during the life time of the application) overlaying a partition on another. The proposed mapping flow integrates scheduling, binding and place and route steps into one mapping process using an ILP formulation. Due to the slowness of ILP solutions, our solution is applicable at design time only. It is implemented using TOMLAB/CPLEX toolbox and we assess its efficacy on a set of 40 synthetic task graphs as well as some multimedia applications.
Similar content being viewed by others
References
Milford M, McAllister J (2009) An ultra-fine processor for fpga dsp chip multiprocessors. In: 2009 Conference record of the forty-third Asilomar conference on signals, systems and computers, pp 226–230
LaForest CE, Steffan JG (2012) Octavo: An FPGA-centric processor family. In: Proceedings of the ACM/SIGDA international symposium on field programmable gate arrays, FPGA ’12, New York, NY, USA, ACM, pp 219–228
Lebedev I, Cheng S, Doupnik A, Martin J, Fletcher C, Burke D, Lin M, Wawrzynek J (2010) Marc: a many-core approach to reconfigurable computing. In: 2010 International conference on reconfigurable computing and FPGAs, pp 7–12
Capalija D, and Abdelrahman TS (2011) Towards synthesis-free jit compilation to commodity FPGAs. In: 2011 IEEE 19th Annual International Symposium on field-programmable custom computing machines (FCCM), pp 202–205
Cheah HY, Brosser F, Fahmy SA, Maskell LD (2014) The idea DSP block-based soft processor for FPGAs. ACM Trans Reconfig Technol Syst 7(3):19
Bergmann Neil W, Shukla Sunil K, Becker Jürgen (2013) QUKU: a dual-layer reconfigurable architecture. ACM Trans Embed Comput Syst 12:63
Paul L, Dash C, Moghaddam MS (2012) Remorph: a runtime reconfigurable architecture. In: 15th Euromicro Conference on digital system design (DSD), pp 26–33
Raza MA, Azeemuddin S (2014) Multiprocessing on FPGA using light weight processor. In: 2014 IEEE International conference on electronics, computing and communication technologies (IEEE CONECCT), pp 1–6
Jain AK, Li X, Fahmy SA, Maskell DL (2016) Adapting the dyser architecture with dsp blocks as an overlay for the xilinx zynq. SIGARCH Comput Archit News 43(4):28–33
Bohnenstiehl B, Stillmaker A, Pimentel J, Andreas T, Liu B, Tran A, Adeagbo E, Baas B (2016) Kilocore: a 32 nm 1000-processor array. In: IEEE HotChips symposium on high-performance chips
Marwedel P, Teich J, Kouveli G, Bacivarov I, Thiele L, Ha S, Lee C, Xu Q, Huang L (2011) Mapping of applications to mpsocs. In: Proceedings of the seventh IEEE/ACM/IFIP international conference on hardware/software codesign and system synthesis, CODES+ISSS ’11, New York, NY, USA. ACM, pp 109–118
Sahu PK, Chattopadhyay S (2013) A survey on application mapping strategies for network-on-chip design. J Syst Archit 59(1):60–76
Yang Bo (2013) Towards optimal application mapping for energy-efficient many-core platforms. Turku Centre for Computer Science, University of Turku, Turku
Singh AK, Shafique M, Kumar A, Henkel J (2013) Mapping on multi/many-core systems: survey of current and emerging trends. In: Design automation conference (DAC), 2013 50th ACM/EDAC/IEEE, pp 1–10
Paone E, Robino F, Palermo G, Zaccaria V, Sander I, Silvano C (2015) Customization of opencl applications for efficient task mapping under heterogeneous platform constraints. In: 2015 design, automation test in Europe conference exhibition (DATE), pp 736–741
Roy A, Manna K, Chattapadhay S (2015) Effect of core ordering on application mapping onto mesh based network-on-chip design. In: 2nd International conference on computing for sustainable global development (INDIACom), pp 363–369
Wang C, Miao L, Xie B, Chen T (2009) An application mapping scheme over distributed reconfigurable system. In: 15th International conference on parallel and distributed systems (ICPADS), pp 535–542
Kinsy MA, Devadas S (2014) Algorithms for scheduling task-based applications onto heterogeneous many-core architectures. In: High performance extreme computing conference (HPEC). IEEE, pp 1–6
Nikolić Borislav, Petters Stefan M (2015) Real-time application mapping for many-cores using a limited migrative model. Real Time Syst 51(3):314–357
Moghaddam MS, Balakrishnan M, Paul K (2015) Applied reconfigurable computing: 11th international symposium, ARC 2015, Bochum, Germany, Apr 13–17, 2015, Proceedings, chapter partial reconfiguration for dynamic mapping of task graphs onto 2D mesh platform. Springer, Cham, pp 373–382
Lee Ganghee, Choi Kiyoung, Dutt ND (2011) Mapping multi-domain applications onto coarse-grained reconfigurable architectures. IEEE Trans Comput Aid Des Integr Circuits Syst 30(5):637–650
Lei T, Kumar S (2003) A two-step genetic algorithm for mapping task graphs to a network on chip architecture. In: Euromicro symposium on digital system design, Proceedings, pp 180–187
Rhee C-H, Jeong H-Y, Ha S (2004) Many-to-many core-switch mapping in 2-d mesh noc architectures. In: IEEE International conference on computer design: VLSI in computers and processors, ICCD 2004, Proceedings, pp 438–443
Zimmer C, Mueller F (2012) Low contention mapping of real-time tasks onto tilepro 64 core processors. In: Proceedings of the 2012 IEEE 18th real time and embedded technology and applications symposium, RTAS ’12, Washington, DC, USA. IEEE Computer Society, pp 131–140
Michael N, Wang Y, Suh GE, Tang A (2013) Quadrisection-based task mapping on many-core processors for energy-efficient on-chip communication. In: Seventh IEEE/ACM international symposium on networks on chip (NoCS), pp 1–2
Das R, Ausavarungnirun R, Mutlu O, Kumar A, Azimi M (2013) Application-to-core mapping policies to reduce memory system interference in multi-core systems. In: IEEE 19th International symposium on high performance computer architecture (HPCA2013), pp 107–118
Fattah M, Rahmani AM, Xu TC, Kanduri A, Liljeberg P, Plosila J, Tenhunen H (2014) Mixed-criticality run-time task mapping for noc-based many-core systems. In: 2014 22nd Euromicro international conference on parallel, distributed, and network-based processing, pp 458–465
Zhu D, Chen L, Pinkston TM, Pedram M (2015) Tapp: temperature-aware application mapping for noc-based many-core processors. In: 2015 Design, automation test in Europe conference exhibition (DATE), pp 1241–1244
Cao S, Salcic Z, Li Z, Wei S, Ding Y (2016) Temperature-aware multi-application mapping on network-on-chip based many-core systems. Microprocess Microsyst 46(B):149–160
Ramaswamy S (1997) Matrix representation of graphs. http://cpsc.ualr.edu/srini/DM/chapters/review5.3.html. Accessed 27 March 2016
Goran AO, Holmstrom K, Edval MM (2014) User’s guide for TOMLAB /CPLEX. Accessed 23 Jan 2017
Xilinx University Program XUPV5-LX110T. www.xilinx.com (2016)
Ahn M, Yoon JW, Paek Y, Kim Y, Kiemb M, Choi K (2006) A spatial mapping algorithm for heterogeneous coarse-grained reconfigurable architectures. In: Proceedings of the conference on design, automation and test in Europe: proceedings, DATE ’06, 3001 Leuven, Belgium, Belgium. European Design and Automation Association, pp 363–368
Yoon JW, Shrivastava A, Park S, Ahn M, Jeyapaul R, Paek Y (2008) SPKM : a novel graph drawing based algorithm for application mapping onto coarse-grained reconfigurable architectures. In: Design automation conference, ASPDAC 2008, Asia and South Pacific, pp 776–782
Yoon JW, Shrivastava A, Park S, Ahn M, Paek Y (2009) A graph drawing based spatial mapping algorithm for coarse-grained reconfigurable architectures. IEEE Trans Very Large Scale Integr Syst 17(11):1565–1578
Jing N, He W, Mao Z (2010) Resource constrained mapping of data flow graphs onto coarse-grained reconfigurable array. In: 23rd IEEE international SOC conference, pp 260–265
Srinivasan K, Chatha KS, Konjevod G (2004) Linear programming based techniques for synthesis of network-on-chip architectures. In: IEEE International conference on computer design: VLSI in computers and processors, ICCD 2004, Proceedings, pp 422–429
Ozturk O, Kandemir M, Son SW (2007) An ilp based approach to reducing energy consumption in noc based cmps. In: ACM/IEEE International symposium on low power electronics and design (ISLPED), pp 411–414
Chou C-L, Marculescu R (2008) Contention-aware application mapping for network-on-chip communication architectures. In: IEEE International conference on computer design, ICCD 2008, pp 164–169
Ghosh P, Sen A, Hall A (2009) Energy efficient application mapping to noc processing elements operating at multiple voltage levels. In: 3rd ACM/IEEE International symposium on networks-on-chip, NoCS 2009, pp 80–85
Huang J, Buckl C, Raabe A, Knoll A (2011) Energy-aware task allocation for network-on-chip based heterogeneous multiprocessor systems. In: Proceedings of the 2011 19th international euromicro conference on parallel, distributed and network-based processing, PDP ’11, Washington, DC, USA. IEEE Computer Society, pp 447–454
Tosun S, Ozturk O, Ozen M (2009) An ilp formulation for application mapping onto network-on-chips. In: International conference on application of information and communication technologies, AICT 2009, pp 1–5
Tosun Suleyman (2011) Cluster-based application mapping method for network-on-chip. Adv Eng Softw 42(10):868–874
Jang W, Pan DZ (2010) A3map: Architecture-aware analytic mapping for networks-on-chip. In: Design automation conference (ASP-DAC), 15th Asia and South Pacific, pp 523–528
Soumya J, Sharma A, Chattopadhyay S (2014) A locally reconfigurable network-on-chip architecture and application mapping onto it. In: 18th International symposium on VLSI design and test, pp 1–6
Soumya J, Sharma A, Chattopadhyay S (2014) Multi-application network-on-chip design using global mapping and local reconfiguration. ACM Trans Reconfig Technol Syst 7(2):1–24
Bender A (1996) Milp based task mapping for heterogeneous multiprocessor systems. In: Proceedings of the conference on european design automation, EURO-DAC ’96/EURO-VHDL ’96, Los Alamitos, CA, USA. IEEE Computer Society Press, pp 190–197
Ostler C, Chatha KS (2007) An ilp formulation for system-level application mapping on network processor architectures. In: Design, automation test in Europe conference exhibition, 2007, DATE ’07, pp 99–104
Murali S, Benini L, De Micheli G (2005) Mapping and physical planning of networks-on-chip architectures with quality-of-service guarantees. In: Design automation conference, 2005, Proceedings of the ASP-DAC 2005, Asia and South Pacific, vol 1, pp 27–32
Kwok Yu-Kwong, Ahmad Ishfaq (1999) Static scheduling algorithms for allocating directed task graphs to multiprocessors. ACM Comput Surv 31(4):406–471
Stuijk S, Basten T, Geilen MCW, Corporaal H (2007) Multiprocessor resource allocation for throughput-constrained synchronous dataflow graphs. In: Proceedings of the 44th annual design automation conference, DAC ’07, New York, NY, USA. ACM, pp 777–782
Kumar A, Fernando S, Ha Y, Mesman B, Corporaal H (2008) Multiprocessor systems synthesis for multiple use-cases of multiple applications on FPGA. ACM Trans Des Autom Electron Syst 13(3):92–97
Erbas C, Cerav-Erbas S, Pimentel AD (2006) Multiobjective optimization and evolutionary algorithms for the application mapping problem in multiprocessor system-on-chip design. Trans Evol Comput 10(3):358–374
Castrillon J, Leupers R, Ascheid G (2013) Maps: mapping concurrent dataflow applications to heterogeneous mpsocs. IEEE Trans Ind Inform 9(1):527–545
Acknowledgements
The authors would like to thank Marcus M. Edvall and Anders Goran for providing specifications and permission to use TOMLAB/CPLEX. Their great help made it easy to develop our mapping flow.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Shahraki Moghaddam, M., Balakrishnan, M. & Choi, K. Optimal mapping of program overlays onto many-core platforms with limited memory capacity. Des Autom Embed Syst 21, 173–194 (2017). https://doi.org/10.1007/s10617-017-9193-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10617-017-9193-9