A new merit function for custom instruction selection under an area budget constraint

Kamal, Mehdi; Yazdanbakhsh, Amir; Noori, Hamid; Afzali-Kusha, Ali; Pedram, Massoud

doi:10.1007/s10617-013-9117-2

A new merit function for custom instruction selection under an area budget constraint

Published: 17 September 2013

Volume 17, pages 1–25, (2013)
Cite this article

Design Automation for Embedded Systems Aims and scope Submit manuscript

Mehdi Kamal¹,
Amir Yazdanbakhsh¹^nAff2,
Hamid Noori¹^nAff3,
Ali Afzali-Kusha¹ &
…
Massoud Pedram⁴

305 Accesses
1 Citation
Explore all metrics

Abstract

This paper presents a new merit function for custom instruction selection phase of the design flow of application-specific instruction-set processors (ASIPs) in the presence of an area budget constraint. In contrast to nearly all of the previously proposed approaches where ratio of the ASIP speed to layout area is used as a merit function to select the candidate custom instructions (CIs), we show that a merit function based on normalized cycle saving and area function can result in better CI selections in terms of the achievable speedup under a given area budget for both greedy and branch-and-bound techniques. The efficacy of the proposed approach is assessed by comparing the results of using the proposed and conventional merit functions for different benchmarks. The comparison points toward an average (maximum) speed enhancement of 3.65 % (27.4 %) for the proposed merit function compared to the conventional merit functions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic complex instruction identification for efficient application mapping onto application-specific instruction set processors

Article 30 June 2015

Design of SENIOR: A Case Study Using $\mathfrak{NoGap}$

Instruction Extension and Generation for Adaptive Processors

References

Clark NT, Zhong H, Mahlke S (2005) Automated custom instruction generation for domain-specific processor acceleration. IEEE Trans Comput 54:1258–1270. doi:10.1109/TC.2005.156
Article Google Scholar
Pozzi L, Atasu K, Ienne P (2006) Exact and approximate algorithms for the extension of embedded processor instruction sets. IEEE Trans Comput Aided Des 25:1209–1229. doi:10.1109/TCAD.2005.855950
Article Google Scholar
Keutzer K, Malik S, Newton AR (2002) From ASIC to ASIP: the next design discontinuity. In: Proceedings of international conference on computer design: VLSI in computers and processors, pp 84–90. doi:10.1109/ICCD.2002.1106752
Chapter Google Scholar
Lu YS, Shen L, Huang LB, Wang ZY, Xiao N (2009) Optimal subgraph covering for customisable VLIW processors. Comput Digit Tech 3:14–23. doi:10.1049/iet-cdt:20070104
Article Google Scholar
Siew-Kei L, Srikanthan T, Clarke CT (2009) Selecting profitable custom instructions for area–time-efficient realization on reconfigurable architectures. IEEE Trans Ind Electron 56:3998–4005. doi:10.1109/TIE.2009.2017091
Article Google Scholar
Bonzini P, Pozzi L (2008) Recurrence-aware instruction set selection for extensible embedded processors. IEEE Trans Very Large Scale Integr (VLSI) Syst 16:1259–1267. doi:10.1109/TVLSI.2008.2001863
Article Google Scholar
Clark N, Hormati A, Mahlke S, Yehia S (2006) Scalable subgraph mapping for acyclic computation accelerators. In: Proceedings of international conference on compilers, architecture and synthesis for embedded systems, pp 147–157. doi:10.1145/1176760.1176779
Google Scholar
Atasu K, Ozturan C, Dundar G, Mencer O, Luk W (2008) CHIPS: custom hardware instruction processor synthesis. IEEE Trans Comput-Aided Des Integr Circuits Syst 27(3):528–541. doi:10.1109/TCAD.2008.915536
Article Google Scholar
Biswas P, Banerjee S, Dutt ND, Pozzi L, Ienne P (2006) ISEGEN: generation of high-quality instruction set extensions by iterative improvement. IEEE Trans Very Large Scale Integr (VLSI) Syst. 14:754–762. doi:10.1109/DATE.2005.191
Article Google Scholar
Clark N, Zhong H, Mahlke SA (2003) Processor acceleration through automated instruction set customization. In: Proceedings of the 36th annual IEEE/ACM international symposium on microarchitecture, pp 129–141. doi:10.1109/MICRO.2003.1253189
Google Scholar
Kastrup B, Bink A, Hoogerbrugge J (1999) ConCISe: a compiler-driven CPLD-based instruction set accelerator. In: Proceedings of the seventh annual IEEE symposium on field-programmable custom computing machines, pp 92–101. doi:10.1109/FPGA.1999.803671
Chapter Google Scholar
Goodwin D, Petkov D (2003) Automatic generation of application specific processors. In: Proceedings of international conference on compilers, architecture and synthesis for embedded systems, pp 137–147. doi:10.1145/951710.951730
Chapter Google Scholar
Yazdanbakhsh A, Salehi ME, Fakhraie SM (2010) Architecture-aware graph-covering algorithm for custom instruction selection. In: Proceedings of the 5th international conference on future information technology, pp 1–6. doi:10.1109/FUTURETECH.2010.5482719
Google Scholar
Muhammad R, Apvrille L, Pacalet R (2008) Evaluation of ASIPs design with LISATek. Lecture notes in computer science, vol 5114. Springer, Berlin, pp 177–186. doi:10.1007/978-3-540-70550-5_20
Google Scholar
The LISATek™ solution: automated embedded processor design and software development tool generation. http://www.coware.com/PDF/products/LISATek.pdf
Biswas P, Dutt N, Ienne P, Pozzi L (2006) Automatic identification of application-specific functional units with architecturally visible storage. In: Proceedings of the conference on design, automation and test in Europe (DATE), pp 212–217. doi:10.1109/DATE.2006.244088
Google Scholar
Cheung N, Parameswaran S, Henkel J (2003) INSIDE: instruction Selection/Identification and design exploration for extensible processors. In: Proceedings of the international conference on computer aided design, pp 291–297. doi:10.1109/ICCAD.2003.1257681
Google Scholar
Clark N, Jason B, Michael C, Mahlke S, Biles S, Flautner K (2005) An architecture framework for transparent instruction set customization in embedded processors. In: Proceedings of the 32nd annual international symposium on computer architecture, pp 272–283. doi:10.1109/ISCA.2005.9
Google Scholar
Gonzalez RE (2000) XTENSA: a configurable and extensible processor. IEEE MICRO. 20(2):60–70. doi:10.1109/40.848473
Article Google Scholar
Scharwaechter H, Kammler D, Leupers R, Ascheid G, Meyr H (2011) A retargetable framework for compiler/architecture co-development. Des Autom Embed Syst 15:1–32. doi:10.1007/s10617-011-9080-8
Article Google Scholar
Pan Y, Mitra T (2004) Characterizing embedded applications for instruction-set extensible processors. In: Proceedings of the design automation conference (DAC), pp 723–728
Google Scholar
Galuzzi C, Bertels K (2011) The instruction-set extension problem: a survey. ACM Trans Reconfigurable Technol Syst 4(18):1–28. doi:10.1145/1968502.1968509
Article Google Scholar
Liao S, Devadas S (1997) Solving covering problems using lpr-based lower bounds. In: Proceedings of the 34th annual conference on design automation (DAC’97), pp 117–120. doi:10.1145/266021.266046
Chapter Google Scholar
Peymandoust A, Pozzil L, Ienne P, Micheli GD (2003) Automatic instruction set extension and utilization for embedded processors. In: Proceedings of the 14th international conference on application-specific systems, architectures and processors (ASAP’03), pp 108–118. doi:10.1109/ASAP.2003.1212834
Google Scholar
Lam S-K, Srikanthan T (2009) Rapid design of area-efficient custom instructions for reconfigurable embedded processing. J Syst Archit 55(1):1–14
Article Google Scholar
Brisk P, Kaplan A, Sarrafzadeh M (2004) Area-efficient instruction set synthesis for reconfigurable system-on-chip designs. In: Proceedings of the 41st annual design automation conference (DAC), pp 395–400
Chapter Google Scholar
Zuluaga M, Topham N (2009) Design-space exploration of resource-sharing solutions for custom instruction set extensions. IEEE Trans Comput-Aided Des Integr Circuits Syst 28(12):1788–1801. doi:10.1109/TCAD.2009.2026355
Article Google Scholar
The GNU operating system. www.gnu.org
Nangate 45 nm open cell library (2008) . http://www.nangate.com
Ramaswamy R, Wolf T (2003) PacketBench: a tool for workload characterization of network processing. In: Proceedings of IEEE international workshop on workload characterization, pp 42–50. doi:10.1109/WWC.2003.1249056
Google Scholar
Guthaus MR, Ringenberg JS, Ernst D, Austin TM, Mudge T, Brown RB (2001) MiBench: a free, commercially representative embedded benchmark suite. In: Proceedings of 4th IEEE international workshop on workload characterization, pp 3–14. doi:10.1109/WWC.2001.15
Google Scholar
Lee C, Potkonjak M, Mangione-Smith WH (1997) MediaBench: a tool for evaluating and synthesizing multimedia and communications systems. In: Proceedings of 30th annual IEEE/ACM international symposium on microarchitecture, pp 330–335. doi:10.1109/MICRO.1997.645830
Google Scholar

Download references

Author information

Amir Yazdanbakhsh
Present address: Department of Electrical and Computer Engineering, University of Wisconsin, Madison, WI, USA
Hamid Noori
Present address: Department of Computer Engineering, Ferdowsi University of Mashhad, Mashhad, Iran

Authors and Affiliations

School of Electrical and Computer Engineering, University of Tehran, Tehran, Iran
Mehdi Kamal, Amir Yazdanbakhsh, Hamid Noori & Ali Afzali-Kusha
Department of Electrical Engineering, University of Southern California, Los Angeles, CA, USA
Massoud Pedram

Authors

Mehdi Kamal
View author publications
You can also search for this author in PubMed Google Scholar
Amir Yazdanbakhsh
View author publications
You can also search for this author in PubMed Google Scholar
Hamid Noori
View author publications
You can also search for this author in PubMed Google Scholar
Ali Afzali-Kusha
View author publications
You can also search for this author in PubMed Google Scholar
Massoud Pedram
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ali Afzali-Kusha.

Appendix

In this section, we provide the motivation for the proposed merit function. For the sake of simplicity and without loss of generality, for this example, we make several simplifying assumptions which are marked by a star in the text below. Assuming that by using the exact identification algorithm of [2], all CIs that meet the defined constraints (e.g., the number of input and output) for an application are identified. In this example, there are eighteen identified CIs, and similar CIs (based on functional and structural isomorphism) are classified in seven internally-similar CI groups.

The conflict graph, the area (A) and clock saving (CS) factor of each CI group are shown in Fig. 18. Note that CS values in the graph show the clock saving of the CIs for a single iteration of some parts in an application. To make the example simple, we assume that the CIs have no intra-conflict^∗ (i.e., the CIs within each group has no conflict), but they may have inter-conflict^∗ (i.e., conflicts among the CIs that belong to different CI groups). Note that an edge between any two nodes in Fig. 18 signifies that all CIs of the corresponding CI groups have some conflict with each other. This also means that by selecting a CI group, all CI groups (nodes) that have a conflict with the selected group (there is a conflict edge between them) must be removed from further consideration.

Now, let us define the parameters CS and CS_Norm for each CI in a CI group (see Fig. 18). The parameter CS denotes the cycle saving factor for each CI in the group (in this example, all the CIs of a CI group have the same cycle saving^∗). Note that, just for the sake simplicity, this assumption has been made only for the motivational example. The purpose of presenting this simple example is to demonstrate that the proposed merit function may improve the speedup of the extensible processors compared to the existing merit functions under the same conditions. Different combinations of cycle savings could be assumed for CIs in the group. These combinations would provide different levels of effectiveness for the proposed merit function compared to the existing merit functions. The results for the efficacy of the proposed function for different combinations of cycle savings in a group are presented in Sect. 5. The parameter CS_Norm is the normalized cycle saving. For the ith CI in a CI group, this value is calculated from

$$ \mathrm{CS}_{\mathit{Norm},i} = \frac{\mathrm{CS}_{i}}{\mathrm{Max}(\{ \forall \mathrm{CS}_{j}j\epsilon \mathit{Candidate}\ \mathrm{CI}\ \mathit{List}\} )} $$

(11)

We also define the parameters A, A _Norm, and Num_CIs for each CI group. Due to fact that all the CIs within a CI group use the same CFU, the parameters A and A _Norm denote the area usage and normalized area usage of the CFU which is used for each CI group, respectively. The parameter Num_CIs represents the number of CIs of the CI group. The parameter A _Norm for the ith CI in each CI group is obtained from

$$ A_{N\mathit{orm},i} = \frac{A_{i}}{\mathrm{Max}(\{ \forall A_{j}j\epsilon \mathit{Candidate}\ \mathrm{CI}\ \mathit{List}\} )} $$

(12)

The normalizations, which are performed using the corresponding maximum values, give rise to the values between 0 and 1 in both cases.

One of Eqs. (4) or (5) is evaluated for all the nodes in the conflict graph and the node with the highest merit value is selected at each iteration of the selection algorithm. Then, the adjacent nodes to the selected node in the conflict graph are removed. This process continues until no node remains in the conflict graph or the area constraint is violated.

In this example, we assume that the area budget is equal to 13 units. To select the CIs from the candidate set, we used the greedy approach. Note that since the design space of this example is very small, we could have used the branch-and-bound technique to obtain the optimal CIs. The selected groups are depicted in Fig. 19. In this example, the merit values are calculated using A _Norm, and CS_Norm values of the CIs. First, we consider the case of the CSPA merit function. In the first iteration, because the value of the merit function for CI group A is the highest (4.40), this node is selected. Because of the conflict with the node A, CI groups B, D, E and G are removed. After this step, the remaining area is 10 units (13−area(A)=10). In the next (last) iteration, from the remaining CI groups (C and F), the CI groups F is selected which has conflict with the group C. Hence, the group F is the last selected group. After selecting these groups, the final cycle saving may be calculated as

$$ \mathrm{CS}_{\mathrm{CSPA}} = (\mathrm{CS}_{\mathit{Norm}_{A}} \times \mathit{Num} \_\mathrm{CIs}_{A} + \mathrm{CS}_{\mathit{Norm}_{F}} \times \mathit{Num}\_\mathrm{CIs}_{F} ) \times \mathrm{CS}_{\max} $$

(13)

where the CS_max is the maximum CS among all the identified CIs.

If the CyS merit function is used, only the group B will be selected. The CS of this group is 16 (∑CS_Norm=3.2) which is greater than the other CI groups. By selecting the group B, the groups A, C, and G must be removed due to conflict. Also, since the remaining area budget is small (13 – area (B) = 2), no other CI group may be selected.

In this example, we achieve a maximum cycle saving of 16, by using 11 area units (two area units are unused). However, the optimal answer to this problem is the groups E and G, which results in a cycle saving of 21 and uses the total area budget. This shows that using CSPA and CyS as the merit functions do not necessarily lead to the optimal solution. The reason is that, CIs with few primitive nodes (such as adder and shifter) and small areas have higher priority to be selected due to their larger CSPAs. On the other hand, CIs with many nodes usually have a higher cycle saving but also have many nodes (and large area) which leads to a large number of conflicts with other CIs. Using CSPA as a merit value can result in selecting CIs with few nodes and a lower CS compared to CIs with many nodes but lower CSPAs. The two consequences of using CSPA as a merit function are selecting low CS CIs with few nodes and removing CIs with many nodes (higher CSs) due to the conflicts with the previously selected CIs. Now, let us consider the case of the CyS merit function which selects the CIs with the higher CS, without considering the area budget usage in the merit function. For this case, after each CI selection, first, the available area budget will be updated and then the CIs whose areas are larger than the updated area budget will be removed from candidate set. As mentioned before, the CIs with higher CSs usually have larger areas and normally more conflict with other CIs. Both of these lead to limiting the choices available for selecting the next CI and, hence, less chance of increasing the speedup much further.

In the case of the proposed merit function, the group E which is the best CI group is selected in the first iteration (see Fig. 19(c)). After selecting this group, the groups A, D, and F are removed due to the conflict. After selecting the group E, the area budget reduces from 13 to 4 (13−area(E)=4) and, hence, in the next iteration, the selection must be done between the groups C and G. The merit value of the group G is greater than that of the group C and, hence, is selected as the better group in the second iteration. After this selection, the area budget reduces to zero terminating the selection phase. Hence, the performance gain of the proposed merit function is better than the conventional merit functions for this example.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kamal, M., Yazdanbakhsh, A., Noori, H. et al. A new merit function for custom instruction selection under an area budget constraint. Des Autom Embed Syst 17, 1–25 (2013). https://doi.org/10.1007/s10617-013-9117-2

Download citation

Received: 25 September 2012
Accepted: 17 July 2013
Published: 17 September 2013
Issue Date: March 2013
DOI: https://doi.org/10.1007/s10617-013-9117-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A new merit function for custom instruction selection under an area budget constraint

Abstract

Access this article

Similar content being viewed by others

Automatic complex instruction identification for efficient application mapping onto application-specific instruction set processors

Design of SENIOR: A Case Study Using $\mathfrak{NoGap}$

Instruction Extension and Generation for Adaptive Processors

References