Power-Performance Trade-Offs in Wide and Clustered VLIW Cores for Numerical Codes

Pericás, Miquel; Ayguadé, Eduard; Zalamea, Javier; Llosa, Josep; Valero, Mateo

doi:10.1007/978-3-540-39707-6_9

Miquel Pericás⁸,
Eduard Ayguadé⁸,
Javier Zalamea⁸,
Josep Llosa⁸ &
…
Mateo Valero⁸

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2858))

Included in the following conference series:

International Symposium on High Performance Computing

570 Accesses
1 Citations

Abstract.

Instruction-Level Parallelism (ILP) is the main source of performance achievable in numerical applications. Architecturalresources and program recurrences are the main limitations to the amount of ILP exploitable from loops, the most time-consuming part in numerical computations. In order to increase the issue rate, current designs use growing degrees of resource replication for memory ports and functional units. But the high costs in terms of power, area and clock cycle of this technique are making it less attractive.

Clustering is a popular technique used to decentralize the design of wide issue cores and enable them to meet the technology constraints in terms of cycle time, area and power. Another approach is using wide functional units. These techniques reduce the port requirements in the register file and the memory subsystem, but they have scheduling constraints which may reduce considerably the exploitable ILP.

This paper evaluates several VLIW designs that make use of both techniques, analyzing power, area and performance, using loops belonging to the Perfect Club benchmark. From this study we conclude that applying either clustering, widening or both on the same core yields very power-efficient configurations with little area requirements.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Berry, M., Chen, D., Koss, P., Kuck, D.: The Perfect Club Benchmarks: Effective Performance Evaluation of Supercomputers, Technical Report 827, CSRD, Univ. of Illinois at Urbana-Champaign (November 1988)
Google Scholar
Brooks, D., Tiwari, V., Martsoni, M.: Wattch: A Framework for Architectural- Level Power Analysis and Optimizations. In: Int’l Symp. on Computer Architecture, ISCA 2000 (2000)
Google Scholar
Faraboschi, P., Brown, G., Desoli, G., Homewood, F.: Lx: A technology platform for customizable VLIW embedded processing. In: Proc. 27th Annual Intl. Symp. on Computer Architecture, (June 2000), pp. 203-213 (2000)
Google Scholar
Gwennap, L.: AltiVec Vectorizes PowerPC. Microprocessor Report 12(6) (May 1998)
Google Scholar
Hrishikesh, M.S., Jouppi, N.P., Farkas, K.I., Burger, D., Keckler, S.W., Shivakumar, P.: The Optimal Logic Depth Per Pipeline Stage is 6 to 8 FO4 Inverter Delays. In: Proc. of the 29^thSymp. on Comp. Arch (ISCA 2002) (May 2002)
Google Scholar
Kessler, R.E.: The Alpha 21264 Microprocessor. IEEE Micro 19(2) (March/April 1999)
Google Scholar
Llosa, J., Valero, M., Ayguadé, E., González, A.: Hypernode reduction modulo scheduling. In: Proc. of the 28^thAnnual Int. Symp. on Microarchitecture (MICRO- 28),pp. 350-360 (November 1995)
Google Scholar
Lòpez, D., Llosa, J., Valero, M., Ayguadé, E.: Cost–Conscious Strategies to Increase Performance of Numerical Programs on Aggressive VLIW Architectures. IEEE Trans. on Comp. 50(10), 1033–1051 (2001)
Article Google Scholar
Rau, B.R., Glaeser, C.D.: Some Scheduling Techniques and an Easily Schedulable Horizontal Architecture for High Performance Scientific Computing. In: Proc. 14th Ann. Microprogramming Workshop, (October 1981), pp. 183-197 (1981)
Google Scholar
Rixner, S., Dally, W.J., Khailany, B., Mattson, P., Kapasi, U.J., Owens, J.D.: Register organization for media processing. In: Proceedings of Sixth International Symposium on High-Performance Computer Architecture, HPCA-6 (2000)
Google Scholar
T.I.Inc. TMS320C62x/67x CPU and Instruction Set Reference Guide (1998)
Google Scholar
Watanabe, T.: The NEC SX-3 Supercomputer System. In: Proc. ComCon 1991, pp. 303- 308 (1991)
Google Scholar
White, S.W., Dhawan, S.: POWER2: Next Generation of the RISC System/6000 Family. IBM J. Research and Development 38(5), 493–502 (1994)
Article Google Scholar
Wilton, S.J.E., Jouppi, N.P.: An enhanced Cache Access and Cycle Time Model. IEEE. J. Solid-State Circuits 31(5), 677–688 (1996)
Article Google Scholar
Zalamea, J., Llosa, J., Ayguadé, E., Valero, M.: MIRS: Modulo Scheduling with integrated register spilling. In: Dietz, H.G. (ed.) LCPC 2001. LNCS, vol. 2624, Springer, Heidelberg (2003)
Chapter Google Scholar
Zalamea, J., Llosa, J., Ayguadé, E., Valero, M.: Modulo Scheduling with integrated register spilling for Clustered VLIW Architectures. In: Proc. 34th annual Int. Symp. on Microarch (December 2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Departament d’Arquitectura de Computadors, Universitat Politécnica de Catalunya (UPC), Jordi Girona, 1-3. Módul D6 Campus Nord, 08034, Barcelona, Spain
Miquel Pericás, Eduard Ayguadé, Javier Zalamea, Josep Llosa & Mateo Valero

Authors

Miquel Pericás
View author publications
You can also search for this author in PubMed Google Scholar
Eduard Ayguadé
View author publications
You can also search for this author in PubMed Google Scholar
Javier Zalamea
View author publications
You can also search for this author in PubMed Google Scholar
Josep Llosa
View author publications
You can also search for this author in PubMed Google Scholar
Mateo Valero
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of California (UCI), 3019 Donald Bren Hall, 92697-3435, Irvine, CA, USA
Alex Veidenbaum
Department of Information and Computer Science, Faculty of Science, Nara women’s University, Kitauoyanishi-machi, Nara-city, 630-8506, Nara, Japan
Kazuki Joe
Keio University, Hiyoshi, Kohoku, Yokohama, 223–8522, Kanagawa, Japan
Hideharu Amano
Tokyo University of Technology, 1404-1 Katakura, Hachioji, 192-0982, Tokyo, Japan
Hideo Aiso

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pericás, M., Ayguadé, E., Zalamea, J., Llosa, J., Valero, M. (2003). Power-Performance Trade-Offs in Wide and Clustered VLIW Cores for Numerical Codes. In: Veidenbaum, A., Joe, K., Amano, H., Aiso, H. (eds) High Performance Computing. ISHPC 2003. Lecture Notes in Computer Science, vol 2858. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39707-6_9

Download citation

DOI: https://doi.org/10.1007/978-3-540-39707-6_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20359-9
Online ISBN: 978-3-540-39707-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics