Abstract
The available instruction level parallelism allowed by current register file organizations is not always fully exploited by media processors when running a multimedia application. This paper introduces a novel register file organization, called multi-shared register file, that eliminates this superfluous instruction scheduling flexibility by reducing the number of read and write ports and partitioning the register file in a special ring structure. A parameterized generic VLIW architecture is used to explore different configurations of our proposed register file structure in terms of estimated silicon area, minimum clock period, estimated power consumption, and multimedia task processing performance. Moreover, a metric highly related to multimedia applications is introduced to study trade-offs between hardware cost and performance. The results show that by substituting a monolithic register file with an equivalent multi-shared register file, the estimated area and the power consumption are considerably reduced at the cost of a negligible performance degradation.













Similar content being viewed by others
References
Agarwala, S., Anderson, T., Hill, A., Ales, M., Damodaran, R., Wiley, P., et al. (2002). A 600-MHz VLIW DSP. IEEE Journal of Solid-State Circuits, 37(11), 1532–1544.
Breach, S. E., Vijaykumar, T. N., & Sohi, G. S. (1994). The anatomy of the register file in a multiscalar processor. In Proceedings of the 27th annual international symposium on microarchitecture (MICRO-27), 1994 (pp. 181–190).
Capitanio, A., Dutt, N., & Nicolau, A. (1992). Partitioned register files for VLIWs: A preliminary analysis of tradeoffs. In Proceedings of the 25th annual international symposium on microarchitecture (MICRO 25), 1992 (pp. 292–300).
Dasu, A., & Panchanathan, S. (2002). A survey of media processing approaches. IEEE Transactions on Circuits and Systems for Video Technology, 12(8), 633–645.
Daubechies, I., & Sweldens, W. (1998). Factoring wavelet transforms into lifting steps. Journal of Fourier Analysis and Applications, 4(3), 245–267.
Faraboschi, P., Brown, G., Fisher, J., Desoll, G., & Homewood, F. (2000). Lx: A technology platform for customizable VLIW embedded processing. In Proceedings of the 27th international symposium on computer architecture, 2000 (pp. 203–213).
Foley, P. (1996). The Mpact media processor redefines the multimedia PC. Compcon ’96. ‘Technologies for the Information Superhighway’ Digest of Papers, pp. 311–318.
Hammond, L., Hubbert, B., Siu, M., Prabhu, M., Chen, M., & Olukolun, K. (2000). The stanford hydra cmp. IEEE Micro, 20(2), 71–84.
ISO/IEC (2002). 15444-3:2002 Information Technology—JPEG 2000 image coding system—Part 3: Motion JPEG 2000. Technical Report.
Janssen, J., & Corporaal, H. (1995). Partitioned register file for TTAs. In Proceedings of the 28th annual international symposium on microarchitecture, 1995 (pp. 303–312).
Jau, T. S., Yang, W. B., & Chang, C. Y. (2006). Analysis and design of high performance, low power multiple ports register files. In IEEE Asia Pacific conference on circuits and systems (APCCAS 2006), 4–7 December 2006 (pp. 1453–1456).
Kailas, K., Franklin, M., & Ebcioğlu, K. (2002). A register file architecture and compilation scheme for clustered ILP processors. Lecture Notes in Computer Science, 2400, 500–510.
Kapasi, U., Rixner, S., Dally, W., Khailany, B., Ahn, J.H., Mattson, P., et al. (2003). Programmable stream processors. Computer, 36(8), 54–62.
Kuroda, I., & Nishitani, T. (1998). Multimedia processors. Proceedings of the IEEE, 86(6), 1203–1221.
Lang, T., Musoll, E., & Cortadella, J. (1997). Individual flip-flops with gated clocks for low power datapaths. IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, [see also IEEE Transactions on Circuits and Systems II: Express Briefs] 44(6), 507–516.
Lee, C., & Smith, J. (1992). A study of partitioned vector register files. In Proceedings on Supercomputing ’92 (pp. 94–103).
Lu, N. P., & Chung, C. P. (1998). Parallelism exploitation in superscalar multiprocessing. IEE Proceedings Computers and Digital Techniques, 145(4), 255–264.
Mallat, S. G. (1989). Multifrequency channel decompositions of images and wavelet models. IEEE Transactions on Acoustics, Speech and Signal Processing, 37(12), 2091–2110. doi:10.1109/29.45554.
Mueller, M., Simon, S., Gryska, H., Wortmann, A., & Buch, S. (2006). Low power synthesizable register files for processor and IP cores. Integrity of VLSI Journal 39(2), 131–155.
Muench, M., Wurth, B., Mehra, R., Sproch, J., & Wehn, N. (2000). Automating RT-level operand isolation to minimize power consumption in datapaths. In Proceedings of the conference on Design, automation and test in Europe (DATE ’00) (pp. 624–633). New York: ACM.
Ostermann, J., Bormans, J., List, P., Marpe, D., Narroschke, M., Pereira, F., et al. (2004). Video coding with H.264/AVC: tools, performance, and complexity. IEEE Circuits and Systems Magazine, 4(1), 7–28.
Payá-Vayá, G., Martín-Langerwerf, J., Taptimthong, P., & Pirsch, P. (2005). RAPANUI: Rapid prototyping for media processor architecture exploration. In: SAMOS 2005, LNCS (Vol. 3553, pp. 32–40). Berlin: Springer.
Payá-Vayá, G., Martín-Langerwerf, J., & Pirsch, P. (2007). Design space exploration of media processors: A generic VLIW architecture and a parameterized scheduler. In ARCS 2007, LNCS (Vol. 4415, pp. 254–267). Berlin: Springer.
Payá-Vayá, G., Martín-Langerwerf, J., Taptimthong, P., & Pirsch, P. (2007). Design space exploration of media processors: A parameterized scheduler. In Proceedings of the Intl. Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (IC-SAMOS 2007) (pp. 41–49). Piscataway: IEEE
Pechanek, G., & Vassiliadis, S. (2000). The ManArray embedded processor architecture. In Proceedings of the 26th Euromicro Conference, 2000 (Vol. 1, pp. 348–355).
Rixner, S., Dally, W., Khailany, B., Mattson, P., Kapasi, U., & Owens, J. (2000). Register organization for media processing. In Proceedings of the sixth international symposium on high-performance computer architecture (HPCA-6), 2000 (pp. 375–386).
Russell, R. M. (1978). The CRAY-1 computer system. Communications of the ACM, 21(1), 63–72.
Saluja, S., & Kumar, A. (2004). Performance analysis of inter cluster communication methods in VLIW architecture. In Proceedings of the 17th international conference on VLSI design, 2004 (pp. 761–764).
Sasanka, R., Adve, S. V., Chen, Y. K., & Debes, E. (2004). The energy efficiency of CMP vs. SMT for multimedia workloads (pp. 196–206).
Seznec, A., Toullec, E., & Rochecouste, O. (2002). Register write specialization register read specialization: A path to complexity-effective wide-issue superscalar processors. In Proceedings of the 35th annual IEEE/ACM international symposium on microarchitecture (MICRO-35), 2002 (pp. 383–394).
Sudharsanan, S., Sriram, P., Frederickson, H., & Gulati, A. (2000). Image and video processing using MAJC 5200. In Proceedings of the 2000 international conference on image processing, 2000 (Vol. 3, pp. 122–125).
Suga, A., & Matsunami, K. (2000). Introducing the FR500 embedded microprocessor. IEEE Micro, 20(4), 21–27.
Swensen, J. A., & Patt, Y. N. (1988). Hierarchical registers for scientific computers. In Proceedings of the 2nd international conference on supercomputing (ICS ’88) (pp. 346–354). New York: ACM.
Synopsys: PrimePower Manual (2006). Synopsys, y-2006.06 edn.
Synopsys: Design Compiler User Guide (2007). Synopsys, version z-2007.03 edn.
Taiwan Semiconductor Manufacturing Company, Ltd (TSMC) (2004). TSMC 0.13 um Core Library Databook (TVB013GHP).
Terechko, A., Le Thenaff, E., Garg, M., van Eijndhoven, J., & Corporaal, H. (2003). Inter-cluster communication models for clustered VLIW processors. In Proceedings of the ninth international symposium on high-performance computer architecture (HPCA-9), 2003 (pp. 354–364).
Texas Instruments Inc. (www.ti.com). TI TMS320C64xx DSPs.
Tremblay, M., Chan, J., Chaudhry, S., Conigliam, A., & Tse, S. (2000). The MAJC architecture: A synthesis of parallelism and scalability. IEEE Micro, 20(6), 12–25.
Vaidyanathan, P. (1993). Multifrequency systems and filters banks. Englewood Cliffs: Prentice-Hall.
Zalamea, J., Llosa, J., Ayguadé, E., & Valero, M. (2001). Modulo scheduling with integrated register spilling for clustered VLIW architectures. In Proceedings of the 34th annual ACM/IEEE international symposium on microarchitecture (MICRO 34) (pp. 160–169). Washington, DC: IEEE Computer Society.
Zalamea, J., Llosa, J., Ayguade, E., & Valero, M. (2003). Hierarchical clustered register file organization for VLIW processors. In Proceedings of the international parallel and distributed processing symposium, 2003 (p. 10).
Zhang, Y., He, H., & Sun, Y. (2005). A new register file access architecture for software pipelining in VLIW processors. In Proceedings of the Asia and South Pacific—Design Automation Conference (ASP-DAC), 2005 (Vol. 1, pp. 627–630).
Zyuban, V., & Kogge, P. (1998). The energy complexity of register files. In Proceedings of the 1998 international symposium on low power electronics and design, 1998 (pp. 305–310).
Acknowledgements
The authors thank Prof. Dr.-Ing. Holger Blume for the given comments in the review process.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Payá-Vayá, G., Martín-Langerwerf, J. & Pirsch, P. A Multi-Shared Register File Structure for VLIW Processors. J Sign Process Syst Sign Image Video Technol 58, 215–231 (2010). https://doi.org/10.1007/s11265-009-0355-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-009-0355-2