Skip to main content
Log in

Extracting and Improving Microarchitecture Performance on Reconfigurable Architectures

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

Applications for constrained embedded systems require careful attention to the match between the application and the support offered by an architecture, at the ISA and microarchitecture levels. Generic processors, such as ARM and Power PC, are inexpensive, but with respect to a given application, they often overprovision in areas that are unimportant for the application’s performance. Moreover, while application-specific, customized logic could dramatically improve the performance of an application, that approach is typically too expensive to justify its cost for most applications. In this paper, we describe our experience using reconfigurable architectures to develop an understanding of an application’s performance and to enhance its performance with respect to customized, constrained logic. We begin with a standard ISA currently in use for embedded systems. We modify its core to measure performance characteristics, obtaining a system that provides cycle-accurate timings and presents results in the style of gprof, but with absolutely no software overhead. We then provide cache-behavior statistics that are typically unavailable in a generic processor. In contrast with simulation, our approach executes the program at full speed and delivers statistics based on the actual behavior of the cache subsystem. Finally, in response to the performance profile developed on our platform, we evaluate various uses of the FPGA-realized instruction and data caches in terms of the application’s performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. S.F. Altschul W. Gish W. Miller E.W. Myers et al. (1990) ArticleTitleBasic Local Alignment Search Tool Journal of Molecular Biology 215 403–10

    Google Scholar 

  2. AMBA Specification, http://www.gaisler.com/doc/amba.pdf (2003).

  3. ARC International, http://www.arccores.com.

  4. Marnix Arnold and Henk Corporaal. Designing Domain-specific Processors. Proceedings. of the 9th Intrenational Symposium on Hardware/Software Codesign, pp. 61–66, (April 2001).

  5. M. Peter (March 1993) ArticleTitleAthanas and Harvey F. Silverman, Processor Reconfiguration Through Instruction-set Metamorphosis IEEE Computer 26 IssueID3 11–18

    Google Scholar 

  6. Austin Todd Larson Eric Ernst Dan (February 2002) ArticleTitleSimpleScalar: An Infrastructure for Computer System Modeling IEEE Computer 35 IssueID2 59–67

    Google Scholar 

  7. Amol Bakshi, Jingzhao Ou, and Viktor K. Prasanna, Towards Automatic Synthesis of a Class of Application-Specific Sensor Networks, Proceedings. of Int’l Conference on Compilers, Architecture, and Synthesis for Embedded Systems, pp. 50–58 (2002).

  8. C. Brandolese, W. Fornaciari, F. Salice, and D. Sciuto, Source-Level Execution Time Estimation of C Programs, Proceedings of the 9th Int’l Symposium on Hardware/Software Codesign, pp. 98–103, (April 2001).

  9. Braun Florian Lockwood John Waldvogel Marcel (January 2002) ArticleTitleProtocol Wrappers for Layered Network Packet Processing in Reconfigurable Hardware IEEE Micro 22 IssueID3 66–74

    Google Scholar 

  10. S. Browne J. Dongarra N. Garner G. Ho P. Mucci (2000) ArticleTitleA Portable Programming interface for Performance Evaluation on Modern processors Int’l Journal of High Performance Computing Applications 14 IssueID3 189–204

    Google Scholar 

  11. T.J. Callahan J.R. Hauser J. Wawrzynek (April 2000) ArticleTitleThe Garp Architecture and C Compiler IEEE Computer 33 IssueID4 62–69

    Google Scholar 

  12. P. P. Chang, S. A. Mahlke, W. Y. Chen, N. J. Warter, and W. W. Hwu, IMPACT: An Architectural Framework for Multiple-Instruction-Issue Processors, Proceedings of the 18th Int’l Symposium on Computer Architecture (May 1991).

  13. Choi Hoon Kim Jong-Sun Yoon Chi-Won Park In-Cheol Hwang Seung Ho Kyung Chong-Min (1999) ArticleTitleSynthesis of application specific instructions for embedded DSP software IEEE Trans. on Comput. 48 IssueID6 603–614

    Google Scholar 

  14. T. H. Cormen, C. E. Leiserson, and R. L. Rivest, Introduction to Algorithms, MIT (1990).

  15. Sarang Dharmapurikar, Praveen Krishnamurthy, Todd Sproull, and John W. Lockwood. Deep Packet Inspection Using Parallel Bloom Filters. Hot Interconnects, pp. 44–51, CA: Stanford, (August 2003).

  16. J. Dongarra, K. London, S. Moore, P. Mucci, D. Terpstra, H. You, and M. Zhou, Experiences and Lessons Learned with a Portable Interface to Hardware Performance Counters, Proceedings of Workshop on Parallel and Distributed Systems: Testing and Debugging (at IPDPS) (April 2003).

  17. J. E. Carrillo Esparza and P. Chow, The Effect of Reconfigurable Units in Superscalar Processors, Proceding. ACM Int’l Symposium on Field Programmable Gate Arrays, pp. 141–150 (2001).

  18. Dirk Fischer, Jürgen Teich, Michael Thies, and Ralph Weper, Efficient Architecture/Compiler Co-exploration For ASIPs, Proceedings, of Int’l Conference on Compilers, Architecture, and Synthesis for Embedded Systems, pp. 27–34, (2002).

  19. Scott Friedman, Nicholas Leidenfrost, Benjamin C. Brodie, and Ron K. Cytron, Hashtables for Embedded and Real-time Systems. Proceedings of the IEEE Workshop on Real-Time Embedded Systems, (2001).

  20. Gaisler Research. http://www.gaisler.com.

  21. David Goodwin and Darin Petkov. Automatic Generation of Application Specific Processors, Proceedings of Int’l Conference on Compilers, Architecture, and Synthesis for Embedded Systems, pp. 137–147 (2003).

  22. M. Gschwind V. Salapura D. Maurer (April 2001) ArticleTitleFPGA Prototyping of a RISC Processor Core for Embedded Applications IEEE Trans. on Very Large Scale Integration (VLSI) Systems 9 IssueID2 241–250

    Google Scholar 

  23. Michael Gschwind, Instruction Set Selection for ASIP Design, Proceedings of the 7th Int’l Symposium on Hardware/Software Codesign, pp. 7–11, (May 1999).

  24. T. Vinod Kumar Gupta, Roberto E. Ko, and Rajeev Barua, Compiler-Directed Customization of ASIP Cores, Proceedings of the 10th Int’l Sympasium on Hardware/Software Codesign, pp. 97–102, (May 2002).

  25. S. Hauck, T. W. Fry, M. M. Hosler, and J. P. Kao, The Chimaera Reconfigurable Functional Unit. Proceedings of IEEE Symposium on FPGAs for Custom Computing Machines, pp. 87–96 (1997).

  26. John R. Hauser and John Wawrzynek, Garp: A MIPS Processor with a Reconfigurable Coprocessor. Procedings of IEEE Sympasium on Field-Programmable Custom Computing Machines (April 1997).

  27. Olivier Hebert and Yvon Savaria Ivan C. Kraljic, A Method to Derive Application-Specific Embedded Processing Cores, Proceedings of the 8th Int’l Symposium on Hardware/Software Codesign, pp. 88–92, (May 2000).

  28. Edson L. Horta, John W. Lockwood, David E. Taylor, and David Parlour, Dynamic Hardware Plugins in an FPGA with Partial Run-time Reconfiguration, Design Automation Conference (DAC), New Orleans, LA (June 2002).

  29. Phillip Jones, Shobana Padmanabhan, Daniel Rymarz, John Maschmeyer, David V. Schuehler, John W. Lockwood, and Ron K. Cytron, Liquid Architecture. Workshop on Next Generation Software (at IPDPS), (2004).

  30. Paolo Ienne Kubilay Atasu, Laura Pozzi, Automatic Application-Specific Instruction-Set Extensions under Microarchitectural Constraints, Proceeding of Design Automation Conference (June 2003).

  31. Kuulusa Mika Nurmi Jari Takala Janne Ojala Pasi Herranen Henrik (1997) ArticleTitleA Flexible DSP core for Embedded Systems IEEE Design and Test of Computers 14 IssueID4 60–68

    Google Scholar 

  32. LEON Specification. http://www. gaisler.com/doc/leon2-1.0.21-xst.pdf (2003).

  33. LEOX.org. http://www.leox.org.

  34. John W Lockwood, Evolvable Internet Hardware Platforms. The Third NASA/DoD Workshop on Evolvable Hardware (EH’2001), pp. 271–279 (July 2001).

  35. John W. Lockwood, The Field-programmable Port Extender (FPX), http://www.arl.wustl.edu/arl/projects/fpx/ (December 2003).

  36. John W. Lockwood, Reconfigurable Network Group. http://www.arl.wustl.edu/arl/projects/fpx/reconfig.htm (May 2004).

  37. John W. Lockwood, James Moscola, Matthew Kulig, David Reddick, and Tim Brooks, Internet Worm and Virus Protection in Dynamically Reconfigurable Hardware, Military and Aerospace Programmable Logic Device (MAPLD), pp. E10, Washington DC, (September 2003).

  38. Plessl Christian Enzler Rolf Walder Herbert Beutel Jan Platzner Marco Thiele Lothar Troester Gerhard (2003) ArticleTitleThe Case for Reconfigurable Hardware in Wearable Computing Personal and Ubiquitous Computing 7 IssueID5 299–308

    Google Scholar 

  39. Joydeep Ray and James C. Hoe, High-level Modeling and FPGA Prototyping of Microprocessors. Proceedings ACM Int’l Symposium on Field Programmable Gate Arrays, pp. 100–107, (February 2003).

  40. Rosenblum Mendel Bugnion Edouard Devine Scott Stephen A. Herrod (January 1997) ArticleTitleUsing the SimOS Machine Simulator to Study Complex Computer Systems ACM Trans. on Modeling and Computer Simulation 7 IssueID1 78–103

    Google Scholar 

  41. C. R. Rupp, M. Landguth, T. Garverick, E. Gomersall, H. Holt, J. M. Arnold, and M. Gokhale, The NAPA Adaptive Processing Architecture, Proceedings IEEE Symposium on FPGAs for Custom Computing Machines, pp. 28–37, (1998).

  42. Eric Schnarr and James R. Larus. Fast out-of-order Processor Simulation Using Memoization, Proceedings of the Eighth International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 283–294. ACM Press (1998).

  43. David V. Schuehler, James Moscola, and John W. Lockwood, Architecture for a Hardware Based, TCP/IP Content Scanning System, Hot Interconnects, pp. 89–94, Stanford, CA: (August 2003).

  44. Barry Shackleford, Mitsuhiro Yasuda, Etsuko Okushi, Hisao Koizumi, Hiroyuki Tomiyama, and Hiroto Yasuura, Memory-CPU Size Optimization for Embedded System Designs, Proceedings of Design Automation Conference, pp. 246–251, (June 1997).

  45. Lesley Shannon and Paul Chow, Using Reconfigurability to Achieve Real-time Profiling for Hardware/Software Codesign, Proceedings of ACM Int’l Symposium on Field Programmable Gate Arrays, pp. 190–199, (2004).

  46. Timothy Sherwood, Erez Perelman, Greg Hamerly, and Brad Calder, Automatically Characterizing Large Scale Program Behavior. Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 45–57. ACM Press, (2002).

  47. H. Singh Lee Ming-Hau Lu Guangrning F.J. Kurdahi N. Bagherzadeh E.M. Chaves Filho (May 2000) ArticleTitleMorphoSys: An Integrated Reconfigurable System for Data-Parallel and Computation-Intensive Applications IEEE Trans. on Computers 49 IssueID5 465–481

    Google Scholar 

  48. Kyung soo Oh, Sang yong Yoon, and Soo-Ik Chae, Emulator Environment Based on an FPGA Prototyping Board. Proceedings of 11th IEEE Int’l Workshop on Rapid System Prototyping, pp. 72–77 (June 2000).

  49. Brinkley Sprunt, Pentium 4 Performance-Monitoring Features, IEEE Micro, 22(4):72–82 (2002).

  50. Stretch, Inc. http://www.stretchinc.com.

  51. Kei Suzuki and Alberto Sangiovanni-Vincentelli, Efficient Software Performance Estimation Methods for Hardware/Software Codesign. Proceedings of Design Automation Conference, pp. 605–610, (June 1996).

  52. Tensilica, Inc. http://www.tensilica.com.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shobana Padmanabhan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Padmanabhan, S., Jones, P., Schuehler, D.V. et al. Extracting and Improving Microarchitecture Performance on Reconfigurable Architectures. Int J Parallel Prog 33, 115–136 (2005). https://doi.org/10.1007/s10766-005-3575-5

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10766-005-3575-5

Keywords

Navigation