Generating ASIPs with Reduced Number of Connections to the Register-File

Ben Asher, Yosi; Lipov, Irina; Tartakovsky, Vladislav; Tiv, Dror

doi:10.1007/s10766-017-0491-4

Generating ASIPs with Reduced Number of Connections to the Register-File

Published: 13 February 2017

Volume 45, pages 1461–1487, (2017)
Cite this article

International Journal of Parallel Programming Aims and scope Submit manuscript

Yosi Ben Asher ORCID: orcid.org/0000-0001-9963-1467¹,
Irina Lipov²,
Vladislav Tartakovsky¹ &
…
Dror Tiv³

145 Accesses
Explore all metrics

Abstract

We propose automatic synthesis of application specific instruction set processors (ASIPs). We use pipeline execution of multi-op machine-instructions, e.g., $*({ reg}1*{ reg}2) = (*{ reg}3)+(*{ reg}4)$ (C-syntax) an instruction with three memory pipeline stages and two arithmetic stages. The problem is, for a given set of loops, to find a pipeline configuration and a multi-op ISA that maximizes the IPC (instructions per cycle) while minimizing the resource usage and the cost of interconnections to the register-file of the resulting CPU. The algorithm is based on finding an efficient cover of a large graph by a small set of convex sub-graphs (called $g_i$s) that are consistent with a given set of pipeline units. Unlike previous works, $g_i$s are not synthesized to circuits that are executed in a co-processor mode but rather both $g_i$s and the rest of the program are executed by the same set of multiop pipeline units. In this way we eliminate the overhead associated with the co-processor mode of regular ASIPs but maintain high values of IPC of these ASIPs. The main advantage of using pipeline execution of multi-op versus VLIW instructions is shown to be the cost of interconnections between the CPU’s execution units and the register file. Once the pipeline configuration and the cover $g_1 \cup \cdots \cup g_n=G$ has been computed the Verilog RTL of the corresponding CPU (extended with branch instructions) is generated and synthesized to FPGA. The results show that, for a set of selected kernels, the resulting ASIP (called Ocpu) obtains higher IPC values compare to an equivalent compilation to an ARM cpu while obtaining similar clock frequencies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Instruction Set Optimization for Application Specific Processors

Design of SENIOR: A Case Study Using $\mathfrak{NoGap}$

Automatic complex instruction identification for efficient application mapping onto application-specific instruction set processors

Article 30 June 2015

Notes

Using Vivado $+$ Kintex-7 we compared the Ocpu $p=2$ $\hbox {k}=5$ with Amber (A free clone of ARM-7) and obtained that the Ocpu required 4.73 versus 1.2 W for the Amber cpu. This is reasonable as the Ocpu contains 10times more functional operations than the Amber.
Using a similar technique to the one used in Dilworth’s theorem [13] wherein it was shown that a DAG G of width K (analogue to a VLIW execution) can be covered by K chains (analog to a pipeline execution).

References

Aho, A.V., Sethi, R., Ullman, J.D.: Compilers Principles, Techniques and Tools. Addison-Wesley, Reading, MA (1986)
MATH Google Scholar
Atasu, K., Pozzi, L., Ienne, P.: Automatic application-specific instruction-set extensions under microarchitectural constraints. In: Proceedings of the 40th Annual Design Automation Conference (2003)
Battista, G.D., Eades, P., Tamassia, R., Tollis, I.G.: Graph Drawing: Algorithms for the Visualization of Graphs. Prentice Hall PTR, Upper Saddle River (1998)
MATH Google Scholar
Ben-Asher, Y., Lipov, I., Tartakovsky, V., Tiv, D.: Using multi-op instructions as a way to generate aggressive asips. In: The 22nd IEEE International Symposium on Field-Programmable Custom Computing Machines FCCM (POSTER) (2014)
Biswas, P., Dutt, N.D.: Code size reduction in heterogeneous-connectivity-based DSPs using instruction set extensions. IEEE Trans. Comput. 54, 1216–1226 (2005)
Article Google Scholar
Biswas, P., Dutt, N.D., Pozzi, L., Ienne, P.: Introduction of architecturally visible storage in instruction set extensions. IEEE Trans. Comput-Aided Des. Integr. Circuits Syst. 26(3), 435–446 (2007)
Article Google Scholar
Bollobás, B., Brightwell, G.: The height of a random partial order: concentration of measure. Ann. Appl. Probab. 2(4), 1009–1018 (1992)
Article MathSciNet MATH Google Scholar
Callahan, T.J., Hauser, J.R., Wawrzynek, J.: The garp architecture and C compiler. Computer 33, 62–69 (2000)
Article Google Scholar
Chattopadhyay, A., Ahmed, W., Karari, K., Kammler, D., Leupers, R., Ascheid, G., Meyr, H.: Design space exploration of partially re-configurable embedded processors. In: Proceedings of the Conference on Design, Automation and Test in Europe, DATE’07 (2007)
Clark, N.T., Zhong, H., Mahlke, S.A.: Automated custom instruction generation for domain-specific processor acceleration. IEEE Trans. Comput. 54(10), 1258–1270 (2005)
Article Google Scholar
Cong, J., Fan, Y., Han, G., Zhang, Z.: Application-specific instruction generation for configurable processor architectures. In: Proceedings of the 2004 ACM/SIGDA 12th International Symposium on Field Programmable Gate Arrays (2004)
Cong, J., Han, G., Zhang, Z.: Architecture and compiler optimizations for data bandwidth improvement in configurable processors. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 14(9), 986–997 (2006)
Article Google Scholar
Dilworth, R.P.: A decomposition theorem for partially ordered sets. Ann. Math. 51(1), 161166 (1950)
Article MathSciNet Google Scholar
Galuzzi, C., Bertels, K.: The instruction-set extension problem: a survey. In: Reconfigurable Computing: Architectures, Tools and Applications, pp. 209–220. Springer (2008)
Hauck, S., Fry, T.W., Hosler, M.M., Kao, J.P.: The chimaera reconfigurable functional unit. In: Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines (1997)
Jain, M.K., Balakrishnan, M., Kumar, A.: Asip design methodologies: survey and issues. In: Proceedings of the The 14th International Conference on VLSI Design (VLSID’01) (2001)
Kastner, R., Kaplan, A., Memik, S.O., Bozorgzadeh, E.: Instruction generation for hybrid reconfigurable systems. ACM Trans. Des. Autom. Electron. Syst. 7, 605–627 (2002)
Article Google Scholar
Kohler, S., Braunes, J., Spallek, R.G., Sawitzki, S.: Improving code efficiency for reconfigurable vliw processors. In: IEEE Computer Society (IPDPS.2002) (2002)
Lattner, C., Adve, V.: LLVM: A compilation framework for lifelong program analysis & transformation. In: Proceedings of the 2004 International Symposium on Code Generation and Optimization (CGO’04). Palo Alto, California, March (2004)
Leibson, S.: Designing SOCs with Configured Cores: Unleashing the Tensilica Xtensa and Diamond Cores. Morgan Kaufmann, Los Altos, CA (2006)
Google Scholar
Liao, S., Devadas, S., Keutzer, K., Tjiang, S.: Instruction selection using binate covering for code size optimization. In: Conference on Computer-Aided Design, ICCAD-95, pp. 393–399. IEEE (1995)
Peymandoust, A., Pozzi, L., Ienne, P., De Micheli, G.: Automatic instruction set extension and utilization for embedded processors. In: Application-Specific Systems, Architectures, and Processors, 2003. Proceedings. IEEE International Conference on, pp. 108–118. IEEE (2003)
Pozzi, L., Ienne, P.: Exploiting pipelining to relax register-file port constraints of instruction-set extensions. In: Proceedings of the 2005 International Conference on Compilers, Architectures and Synthesis for Embedded Systems, CASES’05 (2005)
Pricopi, M., Mitra, T.: Bahurupi: a polymorphic heterogeneous multi-core architecture. ACM Trans. Archit. Code Optim. (TACO) 8(4), 22 (2012)
Google Scholar
Radhakrishnan, S., Guo, H., Parameswaran, S.: Dual-pipeline heterogeneous asip design. In: Proceedings of the 2nd IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (2004)
Thite, S.: On covering a graph optimally with induced subgraphs. ArXiv preprint arXiv:cs/0604013 (2006)
VanAken, J.R., Zick, G.L.: The expression processor: a pipelined, multiple-processor architecture. IEEE Trans. Comput. 100(8), 525–536 (1981)
Article Google Scholar
Verkest, D., Van R, Karl, Bolsens, I., De Man, H.: Coware design environment for heterogeneous hardware/software systems. Des. Autom. Embed. Syst. 1(4), 357–386 (1996)
Article Google Scholar
Villa, T., Kam, T., Brayton, R.K., Sangiovanni-Vincenteili, A.L.: Explicit and implicit algorithms for binate covering problems. IEEE Trans. Comput-Aided Des. Integr. Circuits Syst. 16(7), 677–691 (1997)
Article Google Scholar
Yu, P., Mitra, T.: Disjoint pattern enumeration for custom instructions identification. In: Field Programmable Logic and Applications, 2007. FPL 2007. International Conference on, pp. 273–278. IEEE (2007)

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Haifa, Haifa, Israel
Yosi Ben Asher & Vladislav Tartakovsky
IBM HRL, Haifa, Israel
Irina Lipov
Intel Labs (IDC), Matam, Haifa, Israel
Dror Tiv

Authors

Yosi Ben Asher
View author publications
You can also search for this author in PubMed Google Scholar
Irina Lipov
View author publications
You can also search for this author in PubMed Google Scholar
Vladislav Tartakovsky
View author publications
You can also search for this author in PubMed Google Scholar
Dror Tiv
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yosi Ben Asher.

Additional information

This work is supported by the Israel Ministry of Science, Grant No. 3-10894.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ben Asher, Y., Lipov, I., Tartakovsky, V. et al. Generating ASIPs with Reduced Number of Connections to the Register-File. Int J Parallel Prog 45, 1461–1487 (2017). https://doi.org/10.1007/s10766-017-0491-4

Download citation

Received: 16 May 2016
Accepted: 19 January 2017
Published: 13 February 2017
Issue Date: December 2017
DOI: https://doi.org/10.1007/s10766-017-0491-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Generating ASIPs with Reduced Number of Connections to the Register-File

Abstract

Access this article

Similar content being viewed by others

Instruction Set Optimization for Application Specific Processors

Design of SENIOR: A Case Study Using $\mathfrak{NoGap}$

Automatic complex instruction identification for efficient application mapping onto application-specific instruction set processors

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Generating ASIPs with Reduced Number of Connections to the Register-File

Abstract

Access this article

Similar content being viewed by others

Instruction Set Optimization for Application Specific Processors

Design of SENIOR: A Case Study Using $\mathfrak{NoGap}$

Automatic complex instruction identification for efficient application mapping onto application-specific instruction set processors

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation