Skip to main content
Log in

Generating ASIPs with Reduced Number of Connections to the Register-File

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

We propose automatic synthesis of application specific instruction set processors (ASIPs). We use pipeline execution of multi-op machine-instructions, e.g., \(*({ reg}1*{ reg}2) = (*{ reg}3)+(*{ reg}4)\) (C-syntax) an instruction with three memory pipeline stages and two arithmetic stages. The problem is, for a given set of loops, to find a pipeline configuration and a multi-op ISA that maximizes the IPC (instructions per cycle) while minimizing the resource usage and the cost of interconnections to the register-file of the resulting CPU. The algorithm is based on finding an efficient cover of a large graph by a small set of convex sub-graphs (called \(g_i\)s) that are consistent with a given set of pipeline units. Unlike previous works, \(g_i\)s are not synthesized to circuits that are executed in a co-processor mode but rather both \(g_i\)s and the rest of the program are executed by the same set of multiop pipeline units. In this way we eliminate the overhead associated with the co-processor mode of regular ASIPs but maintain high values of IPC of these ASIPs. The main advantage of using pipeline execution of multi-op versus VLIW instructions is shown to be the cost of interconnections between the CPU’s execution units and the register file. Once the pipeline configuration and the cover \(g_1 \cup \cdots \cup g_n=G\) has been computed the Verilog RTL of the corresponding CPU (extended with branch instructions) is generated and synthesized to FPGA. The results show that, for a set of selected kernels, the resulting ASIP (called Ocpu) obtains higher IPC values compare to an equivalent compilation to an ARM cpu while obtaining similar clock frequencies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Notes

  1. Using Vivado \(+\) Kintex-7 we compared the Ocpu \(p=2\) \(\hbox {k}=5\) with Amber (A free clone of ARM-7) and obtained that the Ocpu required 4.73 versus 1.2 W for the Amber cpu. This is reasonable as the Ocpu contains 10times more functional operations than the Amber.

  2. Using a similar technique to the one used in Dilworth’s theorem [13] wherein it was shown that a DAG G of width K (analogue to a VLIW execution) can be covered by K chains (analog to a pipeline execution).

References

  1. Aho, A.V., Sethi, R., Ullman, J.D.: Compilers Principles, Techniques and Tools. Addison-Wesley, Reading, MA (1986)

    MATH  Google Scholar 

  2. Atasu, K., Pozzi, L., Ienne, P.: Automatic application-specific instruction-set extensions under microarchitectural constraints. In: Proceedings of the 40th Annual Design Automation Conference (2003)

  3. Battista, G.D., Eades, P., Tamassia, R., Tollis, I.G.: Graph Drawing: Algorithms for the Visualization of Graphs. Prentice Hall PTR, Upper Saddle River (1998)

    MATH  Google Scholar 

  4. Ben-Asher, Y., Lipov, I., Tartakovsky, V., Tiv, D.: Using multi-op instructions as a way to generate aggressive asips. In: The 22nd IEEE International Symposium on Field-Programmable Custom Computing Machines FCCM (POSTER) (2014)

  5. Biswas, P., Dutt, N.D.: Code size reduction in heterogeneous-connectivity-based DSPs using instruction set extensions. IEEE Trans. Comput. 54, 1216–1226 (2005)

    Article  Google Scholar 

  6. Biswas, P., Dutt, N.D., Pozzi, L., Ienne, P.: Introduction of architecturally visible storage in instruction set extensions. IEEE Trans. Comput-Aided Des. Integr. Circuits Syst. 26(3), 435–446 (2007)

    Article  Google Scholar 

  7. Bollobás, B., Brightwell, G.: The height of a random partial order: concentration of measure. Ann. Appl. Probab. 2(4), 1009–1018 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  8. Callahan, T.J., Hauser, J.R., Wawrzynek, J.: The garp architecture and C compiler. Computer 33, 62–69 (2000)

    Article  Google Scholar 

  9. Chattopadhyay, A., Ahmed, W., Karari, K., Kammler, D., Leupers, R., Ascheid, G., Meyr, H.: Design space exploration of partially re-configurable embedded processors. In: Proceedings of the Conference on Design, Automation and Test in Europe, DATE’07 (2007)

  10. Clark, N.T., Zhong, H., Mahlke, S.A.: Automated custom instruction generation for domain-specific processor acceleration. IEEE Trans. Comput. 54(10), 1258–1270 (2005)

    Article  Google Scholar 

  11. Cong, J., Fan, Y., Han, G., Zhang, Z.: Application-specific instruction generation for configurable processor architectures. In: Proceedings of the 2004 ACM/SIGDA 12th International Symposium on Field Programmable Gate Arrays (2004)

  12. Cong, J., Han, G., Zhang, Z.: Architecture and compiler optimizations for data bandwidth improvement in configurable processors. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 14(9), 986–997 (2006)

    Article  Google Scholar 

  13. Dilworth, R.P.: A decomposition theorem for partially ordered sets. Ann. Math. 51(1), 161166 (1950)

    Article  MathSciNet  Google Scholar 

  14. Galuzzi, C., Bertels, K.: The instruction-set extension problem: a survey. In: Reconfigurable Computing: Architectures, Tools and Applications, pp. 209–220. Springer (2008)

  15. Hauck, S., Fry, T.W., Hosler, M.M., Kao, J.P.: The chimaera reconfigurable functional unit. In: Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines (1997)

  16. Jain, M.K., Balakrishnan, M., Kumar, A.: Asip design methodologies: survey and issues. In: Proceedings of the The 14th International Conference on VLSI Design (VLSID’01) (2001)

  17. Kastner, R., Kaplan, A., Memik, S.O., Bozorgzadeh, E.: Instruction generation for hybrid reconfigurable systems. ACM Trans. Des. Autom. Electron. Syst. 7, 605–627 (2002)

    Article  Google Scholar 

  18. Kohler, S., Braunes, J., Spallek, R.G., Sawitzki, S.: Improving code efficiency for reconfigurable vliw processors. In: IEEE Computer Society (IPDPS.2002) (2002)

  19. Lattner, C., Adve, V.: LLVM: A compilation framework for lifelong program analysis & transformation. In: Proceedings of the 2004 International Symposium on Code Generation and Optimization (CGO’04). Palo Alto, California, March (2004)

  20. Leibson, S.: Designing SOCs with Configured Cores: Unleashing the Tensilica Xtensa and Diamond Cores. Morgan Kaufmann, Los Altos, CA (2006)

    Google Scholar 

  21. Liao, S., Devadas, S., Keutzer, K., Tjiang, S.: Instruction selection using binate covering for code size optimization. In: Conference on Computer-Aided Design, ICCAD-95, pp. 393–399. IEEE (1995)

  22. Peymandoust, A., Pozzi, L., Ienne, P., De Micheli, G.: Automatic instruction set extension and utilization for embedded processors. In: Application-Specific Systems, Architectures, and Processors, 2003. Proceedings. IEEE International Conference on, pp. 108–118. IEEE (2003)

  23. Pozzi, L., Ienne, P.: Exploiting pipelining to relax register-file port constraints of instruction-set extensions. In: Proceedings of the 2005 International Conference on Compilers, Architectures and Synthesis for Embedded Systems, CASES’05 (2005)

  24. Pricopi, M., Mitra, T.: Bahurupi: a polymorphic heterogeneous multi-core architecture. ACM Trans. Archit. Code Optim. (TACO) 8(4), 22 (2012)

    Google Scholar 

  25. Radhakrishnan, S., Guo, H., Parameswaran, S.: Dual-pipeline heterogeneous asip design. In: Proceedings of the 2nd IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (2004)

  26. Thite, S.: On covering a graph optimally with induced subgraphs. ArXiv preprint arXiv:cs/0604013 (2006)

  27. VanAken, J.R., Zick, G.L.: The expression processor: a pipelined, multiple-processor architecture. IEEE Trans. Comput. 100(8), 525–536 (1981)

    Article  Google Scholar 

  28. Verkest, D., Van R, Karl, Bolsens, I., De Man, H.: Coware design environment for heterogeneous hardware/software systems. Des. Autom. Embed. Syst. 1(4), 357–386 (1996)

    Article  Google Scholar 

  29. Villa, T., Kam, T., Brayton, R.K., Sangiovanni-Vincenteili, A.L.: Explicit and implicit algorithms for binate covering problems. IEEE Trans. Comput-Aided Des. Integr. Circuits Syst. 16(7), 677–691 (1997)

    Article  Google Scholar 

  30. Yu, P., Mitra, T.: Disjoint pattern enumeration for custom instructions identification. In: Field Programmable Logic and Applications, 2007. FPL 2007. International Conference on, pp. 273–278. IEEE (2007)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yosi Ben Asher.

Additional information

This work is supported by the Israel Ministry of Science, Grant No. 3-10894.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ben Asher, Y., Lipov, I., Tartakovsky, V. et al. Generating ASIPs with Reduced Number of Connections to the Register-File. Int J Parallel Prog 45, 1461–1487 (2017). https://doi.org/10.1007/s10766-017-0491-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10766-017-0491-4

Keywords

Navigation