Abstract
This paper analyzes the performance of vector-dominated regions of code in numerical and multimedia applications in a superscalar + vector architecture and compares it with an eight-way superscalar processor. The ability to split a program’s execution into scalar and vector regions allows us to show that (1) as expected, the vector unit is much better than the wide-issue superscalar at executing the vector-dominated regions of the code; (2) on the scalar regions, the eight-way superscalar, although better than a four-way superscalar, is clearly not worth the extra complexity in terms of extra transistors and potential cycle-time limitations. Overall, the vector-enhanced superscalar is from 6% to 303% better than an eight-way superscalar. We also present detailed data on the performance of the memory system, which is usually the key limiting factor when running numerical and multi-\break media applications. We evaluate two additional cache designs that try to alleviate problems created by non-unit stride memory references.
Similar content being viewed by others
Author information
Authors and Affiliations
Corresponding authors
Rights and permissions
About this article
Cite this article
Quintana, F., Corbal, J., Espasa, R. et al. A Cost-Effective Architecture for Vectorizable Numerical and Multimedia Applications. Theory Comput Systems 36, 575–593 (2003). https://doi.org/10.1007/s00224-003-1088-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00224-003-1088-4