Communication scheduling

Authors:
Peter Mattson

Computer Systems Laboratory, Stanford University, Stanford, CA

Computer Systems Laboratory, Stanford University, Stanford, CA
View Profile

,
William J. Dally

Computer Systems Laboratory, Stanford University, Stanford, CA

Computer Systems Laboratory, Stanford University, Stanford, CA
View Profile

,
Scott Rixner

Computer Systems Laboratory, Stanford University, Stanford, CA

Computer Systems Laboratory, Stanford University, Stanford, CA
View Profile

,
Ujval J. Kapasi

Computer Systems Laboratory, Stanford University, Stanford, CA

Computer Systems Laboratory, Stanford University, Stanford, CA
View Profile

,
John D. Owens

Computer Systems Laboratory, Stanford University, Stanford, CA

Computer Systems Laboratory, Stanford University, Stanford, CA
View Profile

ASPLOS IX: Proceedings of the ninth international conference on Architectural support for programming languages and operating systemsNovember 2000Pages 82–92https://doi.org/10.1145/378993.379005

Published:12 November 2000Publication History

ASPLOS IX: Proceedings of the ninth international conference on Architectural support for programming languages and operating systems

Pages 82–92

ABSTRACT

The high arithmetic rates of media processing applications require architectures with tens to hundreds of functional units, multiple register files, and explicit interconnect between functional units and register files. Communication scheduling enables scheduling to these emerging architectures, including those that use shared buses and register file ports. Scheduling to these shared interconnect architectures is difficult because it requires simultaneously allocating functional units to operations and buses and register file ports to the communications between operations. Prior VLIW scheduling algorithms are limited to clustered register file architectures with no shared buses or register file ports. Communication scheduling extends the range of target architectures by making each communication explicit and decomposing it into three components: a write stub, zero or more copy operations, and a read stub. Communication scheduling allows media processing kernels to achieve 98% of the performance of a central register file architecture on a distributed register file architecture with only 9% of the area, 6% of the power consumption, and 37% of the access delay, and 120% of the performance of a clustered register file architecture on a distributed register file architecture with 56% of the area and 50% of the power consumption.

References

1.Capitanio, A., Dutt, N., and Nicolau, A. "Partitioned register files for VLIWs: A preliminary analysis of trade-offs." Proceedings of the 25th Annual International Symposium on Microarchitecture, Dec., 1992, pp. 292-300. Google ScholarDigital Library
2.Colwell, R., Hall, W., Joshi, C., Papworth, D., Rodman, P., and Tornes, J. "Architecture and implementation of a VLIW supercomputer." Proceedings in Supercomputing, Nov., 1990, pp. 910-919. Google ScholarDigital Library
3.Dehnert, J. and Towle, R. "Compiling for the Cydra 5." Journal of Supercomputing, Jan., 1993, 182-227. Google ScholarDigital Library
4.Desoli, G. "Instruction assignment for clustered VLIW DSP compilers: A new approach." Technical Report HPL- 98- 13, Hewlett-Packard Laboratories, Feb., 1998.Google Scholar
5.Diefendorff, K. and Dubey, P. "How multimedia workloads will change processor design." Computer, Sept., 1997, pp. 43-45. Google ScholarDigital Library
6.Ellis, J., Bulldog: A compiler for VLIW architectures. Cambridge, MA: MIT Press, 1986. Google ScholarDigital Library
7.Fernandes, M., Llosa, J., and Topham, N., "Distributed modulo scheduling." Proceedings of the 5th Annual International Conference on High Performance Computer Architecture, Jan., 1999, pp. 130-134. Google ScholarDigital Library
8.Grossman, J. and Dally, W. "Point sample rendering." Proceedings of the 9th Eurographics Workshop on Rendering, June, 1998, pp. 181-192.Google Scholar
9.Lam, M. "Software pipelining: An effective scheduling technique for VLIW machines." Proceedings of the Conference on Programming Language Design and Implementation, June, 1988, pp. 318-328. Google ScholarDigital Library
10.Lowney, P., Freudenberger, S., Karzes, T., Lichtenstein, W., Nix, R., O'Donnell, J., and Ruttenberg, J. "The Multiflow trace scheduling compiler." Journal of Supercomputing, Jan., 1993, pp. 51-142. Google ScholarDigital Library
11.Mangione-Smith, W., Abraham, S., and Davidson, E. "Register requirements of pipelined processors." Proceedings of the International Conference on Supercomputing, July, 1992, pp. 260-271. Google ScholarDigital Library
12.Nystrom, E., and Eichenberger, A. "Effective cluster assignment for modulo scheduling." Proceedings of the 31st Annual International Symposium on Microarchitecture, Dec., 1998, pp. 103 - 114. Google ScholarDigital Library
13.Ozer, E., Banerjia, S., and Conte, T. "Unified assign and schedule: A new approach to scheduling for clustered register file microarchitectures." Proceedings of the 31st Annual International Symposium on Microarchitecture, Dec., 1998, pp. 308-315. Google ScholarDigital Library
14.Rau, B., Glaeser, C., and Picard, R., "Efficient code generation for horizontal architectures: Compiler techniques and architectural support." Proceedings of the International Symposium on Computer Architecture, July, 1982, pp. 131- 139. Google ScholarDigital Library
15.Rixner, S., Dally, W. J., Khailany, B., Mattson, P., Kapasi, U. J., and Owens, J. D. "Register organization for media processing", 6th International Symposium on High-Performance Computer Architecture, Jan., 2000, pp. 375-386.Google Scholar
16.Rixner, S., Dally, W. J., Kapasi, U. J., Khailany, B., Lopez- Lagunas, A., Mattson, P., and Owens, J. D. "A bandwidthefficient architecture for media processing", Proceedings of the 31st Annual International Symposium on Microarchitecture, Dec., 1998, pp. 3-13. Google ScholarDigital Library
17.Stotzer, E. and Leiss, E., "Modulo scheduling for the TMS320C6x VLIW DSP architecture," Proceedings of the ACM SIGPLAN 1999 Workshop on Languages, Compilers, and Tools for Embedded Systems, May, 1999, pp. 28-34. Google ScholarDigital Library

Index Terms

Communication scheduling

Recommendations

Communication scheduling

The high arithmetic rates of media processing applications require architectures with tens to hundreds of functional units, multiple register files, and explicit interconnect between functional units and register files. Communication scheduling enables ...
Read More
Communication scheduling
Special Issue: Proceedings of the ninth international conference on Architectural support for programming languages and operating systems (ASPLOS '00)

The high arithmetic rates of media processing applications require architectures with tens to hundreds of functional units, multiple register files, and explicit interconnect between functional units and register files. Communication scheduling enables ...
Read More
Communication scheduling

The high arithmetic rates of media processing applications require architectures with tens to hundreds of functional units, multiple register files, and explicit interconnect between functional units and register files. Communication scheduling enables ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ASPLOS IX: Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
November 2000
271 pages
ISBN:1581133170
DOI:10.1145/378993
Chairmen:
Larry Rudolph
MIT, Cambridge, MA
,
Anoop Gupta
Microsoft
ACM SIGOPS Operating Systems Review Volume 34, Issue 5
Dec. 2000
269 pages
ISSN:0163-5980
DOI:10.1145/384264
Chairman:
William E. Weihl
Akami technologies, Inc., San Mateo, CA
Issue’s Table of Contents
ACM SIGARCH Computer Architecture News Volume 28, Issue 5
Special Issue: Proceedings of the ninth international conference on Architectural support for programming languages and operating systems (ASPLOS '00)
Dec. 2000
269 pages
ISSN:0163-5964
DOI:10.1145/378995
Editor:
Doug DeGroot
Dallas, TX
Issue’s Table of Contents
Copyright © 2000 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 November 2000
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
ASPLOS IX Paper Acceptance Rate24of114submissions,21%Overall Acceptance Rate535of2,713submissions,20%
More
Upcoming Conference
ASPLOS '24

Sponsor:

sigarch

sigarch

sigarch

29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems

April 27 - May 1, 2024

La Jolla , CA , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 34
  Total Citations
  View Citations
- 921
  Total Downloads
- Downloads (Last 12 months)71
- Downloads (Last 6 weeks)17
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Communication scheduling

ASPLOS IX: Proceedings of the ninth international conference on Architectural support for programming languages and operating systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Communication scheduling

Communication scheduling

Communication scheduling