skip to main content

Prefabrication and postfabrication architecture exploration for partially reconfigurable VLIW processors

Published: 01 August 2008 Publication History


Modern application-specific instruction-set processors (ASIPs) face the daunting task of delivering high performance for a wide range of applications. For enhancing the performance, architectural features, for example, pipelining, VLIW, are often employed in ASIPs, leading to high design complexity. Integrated ASIP design environments, like template-based approaches and language-driven approaches, provide an answer to this growing design complexity. At the same time, increasing hardware design costs have motivated the processor designers to introduce high flexibility in the processor. Flexibility, in its most effective form, can be introduced to the ASIP by coupling a reconfigurable unit to the base processor. Because of its obvious benefits, several reconfigurable ASIPs (rASIPs) have been designed for years. This design paradigm gained momentum with the advent of coarse-grained FPGAs, where the lack of domain-specific performance common in general-purpose FPGAs are largely overcome by choosing application-dependent basic functional units. These rASIP designs lack a generic flow from high-level specification, resulting in intuitive design decisions and hard-to-retarget processor design tools. Although partial, template-based approaches for rASIP design is existent, a clear design methodology especially for the prefabrication architecture exploration is not present. In order to address this issue, a high-level specification and design methodology for partially reconfigurable VLIW processors is proposed in this article. To show the benefit of this approach, a commercial VLIW processor is used as the base architecture and two domains of applications are studied for potential performance gain.


ASIP Meister.]]
Atasu, K., Pozzi, L., and Ienne, P. 2003. Automatic application-specific instruction-set extensions under microarchitectural constraints. In Proceedings of the 40th Conference on Design Automation (DAC'03). ACM Press, New York. 256--261.]]
Athanas, P. M. and Silverman, H. F. 1993. Processor reconfiguration through instruction-set metamorphosis. IEEE Comput. 26, 3, 11--18.]]
Bansal, N., Gupta, S., Dutt, N., and Nicolau, A. 2003. Analysis of the performance of coarse-grain reconfigurable architectures with different processing element configurations. In Workshop on Architecture Specific Processors (WASP).]]
Bansal, N., Gupta, S., Dutt, N., Nicolau, A., and Gupta, R. 2003. Network topology exploration of mesh-based coarse-grain reconfigurable architectures. Tech. rep., Center for Embedded Computer Systems, University of California, Irvine.]]
Barat, F., Lauwereins, R., and Deconinck, G. 2002. Reconfigurable instruction set processors from a hardware/software perspective. IEEE Trans. Softw. Engin. 28, 9, 847--862.]]
Biswas, P., Choudhary, V., Atasu, K., Pozzi, L., Ienne, P., and Dutt, N. 2004. Introduction of local memory elements in instruction set extensions. In Proceedings of the 41st Annual Conference on Design Automation (DAC'04).]]
Biswas, P., Banerjee, S., Dutt, N., Pozzi, L., and Ienne, P. 2006. ISEGEN: An iterative improvement-based ISE generation technique for fast customization of processors. IEEE Trans. VLSI Syst. 14, 7.]]
Compton, K. and Hauck, S. 2002. Reconfigurable computing: A survey of systems and software. ACM Comput. Surv. 34, 2, 171--210.]]
de Beeck, P. O., Barat, F., Jayapala, M., and Lauwereins, R. 2001. CRISP: A template for reconfigurable instruction set processors. In Proceedings of the 11th International Conference on Field-Programmable Logic and Applications (FPL'01). Springer-Verlag, Berlin. 296--305.]]
Dimitroulakos, G., Galanis, M. D., Kostaras, N., and Goutis, C. E. 2007. A unified evaluation framework for coarse grained reconfigurable array architectures. In Proceedings of the 4th International Conference on Computing Frontiers (CF'07). ACM Press, New York. 161--172.]]
Dupenloup, G., Lemeunier, T., and Mayr, R. 2006. Transistor abstraction for the functional verification of FPGAs. In Proceedings of the 43rd Annual Conference on Design Automation (DAC '06). ACM Press, New York. 1069--1072.]]
Fauth, A., Praet, J. V., and Freericks, M. 1995. Describing instruction set processors using nML. In Proceedings of the European Design and Test Conference (ED&TC).]]
Graham, P. and Nelson, B. E. 1999. Reconfigurable processors for high-performance, embedded digital signal processing. In Proceedings of the 9th International Workshop on Field-Programmable Logic and Applications. Springer-Verlag, New York. 1--10.]]
Grun, P., Halambi, A., Khare, A., Ganesh, V., Dutt, N., and Nicolau, A. 1998. EXPRESSION: An ADL for system level design exploration. Tech. rep., Department of Information and Computer Science, University of California, Irvine.]]
Hartenstein, R. 2001. A Decade of reconfigurable computing: A visionary retrospective. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE'01). IEEE Press, Los Alamitos, CA. 642--649.]]
Hoffmann, A., Kogel, T., Nohl, A., Braun, G., Schliebusch, O., Wahlen, O., Wieferink, A., and Meyr, H. 2001. A novel methodology for the design of application specific instruction-set processor using a machine description language. IEEE Trans. Comput.-Aid. Design Integr. Cicuits Syst. 20, 11, 1338--1354.]]
Hoffmann, A., Meyr, H., and Leupers, R. 2002. Architecture Exploration for Embedded Processors with LISA. Kluwer Academic Publishers Novell, MA.]]
Iseli, C. and Sanchez, E. 1995. Spyder: A SURE (SUperscalar and REconfigurable) processor. J. Supercomput. 9, 3, 231--252.]]
Lam, M. S. 1988. Software pipelining: An effective scheduling technique for VLIW machines. SIGPLAN Notices 23, 7, 244--256.]]
Leupers, R., Kraemer, K. K. S., and Pandey, M. 2006. A design flow for configurable embedded processors based on optimized instruction set extension synthesis. In Proceedings of Design, Automation & Test in Europe (DATE). Munich, Germany.]]
Lin, J. Y., Chen, D., and Cong, J. 2006. Optimal simultaneous mapping and clustering for FPGA delay optimization. In Proceedings of the 43rd Annual Conference on Design Automation (DAC'06). ACM Press, New York. 472--477.]]
Lodi, A., Toma, M., Campi, F., Cappelli, A., Canegallo, R., and Guerrieri, R. 2003. A VLIW processor with reconfigurable instruction set for embedded applications. IEEE J. Solid-State Circuits 38, 11, 1876--1886.]]
McMurchie, L. and Ebeling, C. 1995. PathFinder: A negotiation-based performance-driven router for FPGAs. In Proceedings of the ACM 3rd International Symposium on Field-Programmable Gate Arrays (FPGA'95). ACM Press, New York. 111--117.]]
Mei, B., Lambrechts, A., Verkest, D., Mignolet, J., and Lauwereins, R. 2005. Architecture exploration for a reconfigurable architecture template. IEEE Design Test 22, 2, 90--101.]]
Mei, B., Vernalde, S., Verkest, D., and Lauwereins, R. 2004. Design methodology for a tightly coupled VLIW/reconfigurable matrix architecture: A case study. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE'04).]]
Mei, B., Vernalde, S., Verkest, D., Man, H., and Lauwereins, R. 2002. DRESC: A retargetable compiler for coarse-grained reconfigurable architectures. In Proceedings of the International Conference on Field Programmable Technology.]]
Mucci, C., Campi, F., Deledda, A., Fazzi, A., Ferri, M., and Bocchi, M. 2005. A cycle-accurate ISS for a dynamically reconfigurable processor architecture. In Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05).]]
Murgai, R., Brayton, R., and Sangiovanni-Vincentelli, A. 1991. On clustering for minimum delay/area. In Proceedings of the 1991 IEEE/ACM International Conference on Computer-Aided Design (ICCAD'91). 6--9.]]
Panainte, E. M., Vassiliadis, S., Wong, S., Gaydadjiev, G., Bertels, K., and Kuzmanov, G. 2004. The MOLEN polymorphic processor. IEEE Trans. Comput. 53, 11, 1363--1375.]]
Razdan, R. and Smith, M. D. 1994. A high-performance microarchitecture with hardware-programmable functional units. In Proceedings of the 27th Annual International Symposium on Microarchitecture. 172--80.]]
Rosa, A. L., Lavagno, L., and Passerone, C. 2001. A software development tool chain for a reconfigurable processor. In Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems.]]
Sahni, S. and Bhatt, A. 1980. The complexity of design automation problems. In Proceedings of the 17th Conference on Design Automation (DAC'80). ACM Press, New York. 402--411.]]
Schliebusch, O., Chattopadhyay, A., Kammler, D., Leupers, R., Ascheid, G., and Meyr, H. 2005. A framework for automated and optimized ASIP implementation supporting multiple hardware description languages. In Proceedings of the ASPDAC. Shanghai, China.]]
Sharma, A., Ebeling, C., and Hauck, S. 2005. Architecture adaptive routability-driven placement for FPGAs. In Proceedings of the ACM/SIGDA 13th International Symposium on Field-programmable Gate Arrays (FPGA'05). ACM Press, New York. 266--266.]]
Synopsys. Design compiler]]
Tessier, R. and Burleson, W. 2001. Reconfigurable computing for digital signal processing: A survey. J. VLSI Signal Process. Syst. 28, 1--2, 7--27.]]
The Impact Research Group.]]
von Sydow, T., Korb, M., Neumann, B., Blume, H., and Noll, T. G. 2006a. Modelling and quantitative analysis of coupling mechanisms of programmable processor cores and arithmetic oriented eFPGA macros. In Proceedings of the IEEE International Conference on Reconfigurable Computing and FPGA's. (ReConFig'06b). 1--10.]]
von Sydow, T., Neumann, B., Blume, H., and Noll, T. G. 2006b. Quantitative analysis of embedded FPGA-architectures for arithmetic. In Proceedings of the 17th International Conference on Application-specific Systems, Architectures and Processors (ASAP'06). IEEE Computer Society, Los Alamitos, CA. 125--131.]]

Cited By

View all
  • (2017)Delay analysis and optimization for inter-core interference in real-time embedded multicore systemsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2016.12.004103:C(77-86)Online publication date: 1-May-2017
  • (2017)Minimizing Bank Conflict Delay for Real-Time Embedded Multicore Systems via Bank MappingSmart Computing and Communication10.1007/978-3-319-52015-5_2(12-21)Online publication date: 13-Jan-2017

Index Terms

  1. Prefabrication and postfabrication architecture exploration for partially reconfigurable VLIW processors



    Information & Contributors


    Published In

    cover image ACM Transactions on Embedded Computing Systems
    ACM Transactions on Embedded Computing Systems  Volume 7, Issue 4
    July 2008
    264 pages
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]


    Association for Computing Machinery

    New York, NY, United States

    Journal Family

    Publication History

    Published: 01 August 2008
    Accepted: 01 March 2008
    Received: 01 August 2007
    Published in TECS Volume 7, Issue 4


    Request permissions for this article.

    Check for updates

    Author Tags

    1. ASIP
    2. VLIW
    3. coarse-grained FPGA


    • Research-article
    • Research
    • Refereed


    Other Metrics

    Bibliometrics & Citations


    Article Metrics

    • Downloads (Last 12 months)2
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 27 Feb 2025

    Other Metrics


    Cited By

    View all
    • (2017)Delay analysis and optimization for inter-core interference in real-time embedded multicore systemsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2016.12.004103:C(77-86)Online publication date: 1-May-2017
    • (2017)Minimizing Bank Conflict Delay for Real-Time Embedded Multicore Systems via Bank MappingSmart Computing and Communication10.1007/978-3-319-52015-5_2(12-21)Online publication date: 13-Jan-2017

    View Options

    Login options

    Full Access

    View options


    View or Download as a PDF file.



    View online with eReader.







    Share this Publication link

    Share on social media