Abstract
Software pipelining for instruction-level parallel computers with non-blocking caches usually assigns memory access latency by assuming either all accesses are cache hits or all are cache misses. We contend setting memory latencies by cache reuse analysis leads to better software pipelining than either an all-hit or all-miss assumption. Using a simple cache-reuse model, our software pipelining optimization achieved 10% improved execution performance over assuming all-cache-hits and used 18% fewer registers than required by an all-cache-miss assumption. We conclude that software pipelining for architectures with non-blocking cache should incorprate a memory-reuse model.
This work was partially supported by the National Science Foundation under grants CCR-9409341 and CCR-9308348, as well as a grant from Digital Equipment.
Chapter PDF
References
Abraham, S., Sugumar, R., Windheiser, D., Rau, B., and Gupta, R. Predictability of load/store instruction latencies. In Proceedings of the 26th International Symposium on Microarchitecture (MICRO-26) (Austin, TX, December 1993), pp. 139–152.
Chen, T.-F., and Baer, J.-L. Reducing memory latency via non-blocking and prefetching caches. In Proceedings of the Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (Boston, Massachusetts, 1992), pp. 51–61.
Ding, C., Carr, S., and Sweany, P. Software pipelining with cache-reuse information. Tech. Rep. 96-07, Michigan Technological University, Sept. 1996. ftp://cs.mtu.edu/pub/carr/moduto.ps.gz.
Lam, M. Software pipelining: An effective scheduling technique for VLIW machines. SIGPLAN Notices 23, 7 (July 1988), 318–328. Proceedings of the ACM SIGPLAN '88 Conference on Programming Language Design and Implementation.
McKinley, K. S., Carr, S., and Tseng, C.-W. Improving data locality with loop transformations. ACM Transactions on Programming Languages and Systems 18, 4 (1996), 424–453.
Mowry, T. C., Lam, M. S., and Gupta, A. Design and evaluation of a compiler algorithm for prefetching. In Proceedings of the Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (Boston, Massachusetts, 1992), pp. 62–75.
Rau, B. Iterative modulo scheduling. In Proceedings of the 27th International Symposium on Microarchitecture (MICRO-27) (San Jose, CA, December 1994), pp. 63–74.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1997 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ding, C., Carr, S., Sweany, P. (1997). Modulo scheduling with cache reuse information. In: Lengauer, C., Griebl, M., Gorlatch, S. (eds) Euro-Par'97 Parallel Processing. Euro-Par 1997. Lecture Notes in Computer Science, vol 1300. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0002856
Download citation
DOI: https://doi.org/10.1007/BFb0002856
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63440-9
Online ISBN: 978-3-540-69549-3
eBook Packages: Springer Book Archive