ABSTRACT
Traditional list schedulers order instructions based on an optimistic estimate of the load delay imposed by the implementation. Therefore they cannot respond to variations in load latencies (due to cache hits or misses, congestion in the memory interconnect, etc.) and cannot easily be applied across different implementations. We have developed an alternative algorithm, known as balanced scheduling, that schedule instructions based on an estimate of the amount of instruction level parallelism in the program. Since scheduling decisions are program-rather than machine-based, balanced scheduling is unaffected by implementation changes. Since it is based on the amount of instruction level parallelism that a program can support, it can respond better to variations in load latencies. Performance improvements over a traditional list scheduler on a Fortran workload and simulating several different machine types (cache-based workstations, large parallel machines with a multipath interconnect and a combination, all with non-blocking processors) are quite good, averaging between 3% and 18%.
- 1.Anant Agarwal, Beng-Hong Lim, David Kranz, and John Kubiatowicz. APRIL: A processor architecture for multiprocessing. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 104-114. IEEE, May 1990. Google ScholarDigital Library
- 2.Robert Alverson, David Cattahan, Daniel Cummings, Brian Koblenz, Allan Porterfield, and Burton Smith. The Tera Computer System. In 1990 International Conference on S~ercomputing, pages 1-6. SIGARCH, June 1990. Google ScholarDigital Library
- 3.ANS X3.9-1978. American National Standard Programming language FORTRAN. American National Standards Institute, New York, 1978.Google Scholar
- 4.M. Berry, D. Chen, D. Kuck, S. Lo, Y. Pang, L. Pointer, R. Roloff, A. Samah, E. Clementi, S. Chin, D. Schneider, G. Fox, E Messina, D. Walker, C. Hsiung, J. Schwarzmeier, K. Lue, S. Orszag, E Seidl, O. Johnson, R. Goodrum, and J. Martin. The perfect club: Effective performance evaluation of supercomputers. The International Journal of Supercomputer Applications, 3(3), Fall 1989.Google Scholar
- 5.Bradley Efron. The jackknife, the bootstrap, and other resampling plans. SiAM/CBMS-NSF Regional conference series in applied mathematics, volume 38, 1982.Google ScholarCross Ref
- 6.John R. Ellis. Bulldog: A Compiler for VLIW Architectures. ACM doctoral dissertation award; 1985. The MIT Press, 1986. Google ScholarDigital Library
- 7.S.I. Feldman, David M. Gay, Mark W. Maimone, and N. L Schryer. A Fortran-to-C converter. Computer Science Technical Report 149, AT&T Bell Laboratories, Murray Hill, NJ 07974, April 1991.Google Scholar
- 8.Phillip B. Gibbons and Steven S. Muchnick. Efficient instruction scheduling for a pipelined architecture. Proceedings of the SIG- PLAN 1986 Symposium on Compiler Construction, SIGPLAN Notices, 21 (7), July 1986. Google ScholarDigital Library
- 9.John L. Hennessy and Thomas R. Gross. Code generation and reorganization in the presence of pipeline constraints. In Symposium on Principles of Programming Languages, pages 120-127, January 1982. Google ScholarDigital Library
- 10.John L. Hennessy and David A. Patterson. Computer Architecture: A Quantitative Approach. Morgan Kaufmann, 1990. Google ScholarDigital Library
- 11.Mark Donald Hill. Aspects of Cache Memory andInstruction Buffer Performance. PhD thesis, University of California, Berkeley, November 1987.Google Scholar
- 12.Gerry Kane. mlps RISC Architecture. Prentice-HaU, 1988. Google ScholarDigital Library
- 13.D. Kroft. Lockup-free instruction fetch/prefetch cache organizattion. In 8th Annual International Symposium on Computer Architecture, pages 81-87, 1981, Google ScholarDigital Library
- 14.E. Lawler, J. K. Lenstra, C. Martel, B. Simons, and L. Stockmeyer. Pipeline scheduling: A survey. Research Report RJ-5738, IBM, July t987.Google Scholar
- 15.Motorola. MC88100 RISC Microprocessor User's Manual. Prentice Hall, 1990. Google ScholarDigital Library
- 16.Krishna V. Palem and Barbara B. Simons. Scheduling time-critical instructions on RISC machines. In ACM Symposium on Principles of Programming Languages, January 1990. Google ScholarDigital Library
- 17.C. Scheurich and M. Dubois. Lockup-free caches in high-performance multiprocessors. Journal of Parallel and Distributed Processing, 11 (1):25-36, January 1991. Google ScholarDigital Library
- 18.G.S. Sohi and M. Franklin. High-bandwidth datamemory systems for superscalar processor. In Fourth International Conference on Archi. tectural Support for Programming Languages and Operating Systems (ASPLOS), pages 53-62, April 1991. Google ScholarDigital Library
- 19.Richard Stallman. The GNU project optimizing C compiler. Free Software Foundation, Inc.Google Scholar
- 20.Robert Endre Tarjan. Data Structures and Network Algorithms, volume 44 of Regional Conference Series in Applied Mathematics. Society for Industrial and Applied Mathematics, 1983.Google Scholar
- 21.H. S. Warren, Jr. instruction scheduling for the IBM RISC System/6000 processor. IBM Journal of Research and Development, 34(1 ), January 1990. Google ScholarDigital Library
- 22.Michael J. Woodard. Personal communication. Scheduling techniques used in Sun SPARC compilers, September 1992.Google Scholar
Index Terms
- Balanced scheduling: instruction scheduling when memory latency is uncertain
Recommendations
Improving balanced scheduling with compiler optimizations that increase instruction-level parallelism
Traditional list schedulers order instructions based on an optimistic estimate of the load latency imposed by the hardware and therefore cannot respond to variations in memory latency caused by cache hits and misses on non-blocking architectures. In ...
Balanced scheduling: instruction scheduling when memory latency is uncertain
Traditional list schedulers order instructions based on an optimistic estimate of the load delay imposed by the implementation. Therefore they cannot respond to variations in load latencies (due to cache hits or misses, congestion in the memory ...
Balanced scheduling: instruction scheduling when memory latency is uncertain
20 Years of the ACM SIGPLAN Conference on Programming Language Design and Implementation 1979-1999: A SelectionTraditional list schedulers order instructions based on an optimistic estimate of the load delay imposed by the implementation. Therefore they cannot respond to variations in load latencies (due to cache hits or misses, congestion in the memory ...
Comments