research-article

TLSync: support for multiple fast barriers using on-chip transmission lines

Authors:

Milos Prvulovic,

Alenka ZajicAuthors Info & Claims

ISCA '11: Proceedings of the 38th annual international symposium on Computer architecture

Pages 105 - 116

https://doi.org/10.1145/2000064.2000078

Published: 04 June 2011 Publication History

Abstract

As the number of cores on a single-chip grows, scalable barrier synchronization becomes increasingly difficult to implement. In software implementations, such as the tournament barrier, a larger number of cores results in a longer latency for each round and a larger number of rounds. Hardware barrier implementations require significant dedicated wiring, e.g., using a reduction (arrival) tree and a notification (release) tree, and multiple instances of this wiring are needed to support multiple barriers (e.g., when concurrently executing multiple parallel applications).

This paper presents TLSync, a novel hardware barrier implementation that uses the high-frequency part of the spectrum in a transmission-line broadcast network, thus leaving the transmission line network free for non-modulated (baseband) data transmission. In contrast to other implementations of hardware barriers, TLSync allows multiple thread groups to each have its own barrier. This is accomplished by allocating different bands in the radio-frequency spectrum to different groups. Our circuit-level and electromagnetic models show that the worst-case latency for a TLSync barrier is 4ns to 10ns, depending on the size of the frequency band allocated to each group, and our cycle-accurate architectural simulations show that low-latency TLSync barriers provide significant performance and scalability benefits to barrier-intensive applications.

Supplementary Material

JPG File (isca_3b_1.jpg)

Download
15.36 KB

MP4 File (isca_3b_1.mp4)

Download
156.93 MB

References

[1]

J. L. Abellán, J. Fernández, and M. E. Acacio. Efficient and scalable barrier synchronization for many-core cmps. In 7th ACM Intl. Conf. on Computing frontiers, pages 73--74, 2010.

Digital Library

[2]

Advanced Design System. Agilent Technologies, Santa Clara CA, USA, 2010.

[3]

G. Almási, C. Archer, J. G. Castaños, J. A. Gunnels, C. C. Erway, P. Heidelberger, X. Martorell, J. E. Moreira, K. Pinnow, J. Ratterman, B. D. Steinmacher-Burow, W. Gropp, and B. Toonen. Design and implementation of message-passing services for the Blue Gene/L supercomputer. IBM J. Res. Dev., 49:393--406, 2005.

Digital Library

[4]

B. Beck, B. Kasten, and S. Thakkar. VLSI assist for a multiprocessor. In second Intl. Conf. on Architectual Support for Prog. Lang. and Operating Sys., pages 10--20, 1987.

Digital Library

[5]

B. M. Beckmann and D. A. Wood. TLC: Transmission Line Caches. In 36th annual IEEE/ACM Intl. Symp. on Microarchitecture, pages 43--54, 2003.

Digital Library

[6]

C. J. Beckmann and C. D. Polychronopoulos. Fast barrier synchronization hardware. In 1990 ACM/IEEE Conf. on Supercomputing, pages 180--189, 1990.

Digital Library

[7]

C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC benchmark suite: Characterization and architectural implications. In 17th Int'l. Conf. on Parallel Architectures and Compilation Techniques (PACT), 2008.

Digital Library

[8]

J. Borremans, S. Thijs, M. Dehan, A. Mercha, and P. Wambacq. Low-cost feedback-enabled LNAs in 45nm CMOS. In Proc. of ESSCIRC '09, pages 100--103, 2009.

[9]

A. Carpenter, J. Hu, J. Xu, M. Huang, and H. Wu. A case for globally shared-medium on-chip interconnect. In 38th annual Intl. Symp. on Computer Architecture, 2011.

Digital Library

[10]

M.-C. F. Chang, J. Cong, A. Kaplan, C. Liu, M. Naik, J. Premkumar, G. Reinman, E. Socher, and S.-W. Tam. Power reduction of cmp communication networks via rf-interconnects. In 41st annual IEEE/ACM Intl. Symp. on Microarchitecture, pages 376--387, 2008.

Digital Library

[11]

Cray Research, Inc. CRAY T3D System Architecture Overview, 1993.

[12]

E. M. B. Consortium. EEMBC benchmark. www.eembc.org.

[13]

A. Gottlieb, R. Grishman, C. P. Kruskal, K. P. McAuliffe, L. Rudolph, and M. Snir. The NYU ultracomputer - designing a MIMD, shared-memory parallel machine. In 9th Intl. Symp. on Computer Architecture, pages 27--42, 1982.

Digital Library

[14]

W. T.-Y. Hsu and P.-C. Yew. An effective synchronization network for hot-spot accesses. ACM Trans. Comput. Syst., 10:167--189, 1992.

Digital Library

[15]

Intl. Technology Roadmap for Semiconductors. ITRS - 2008 update, 2008. http://www.itrs.net.

[16]

S. W. Keckler, W. J. Dally, D. Maskit, N. P. Carter, A. Chang, and W. S. Lee. Exploiting fine-grain thread level parallelism on the MIT multi-ALU processor. In 25th annual Intl. Symp. on Computer architecture, pages 306--317, 1998.

Digital Library

[17]

T. Krishna, A. Kumar, P. Chiang, M. Erez, and L.-S. Peh. NoC with near-ideal express virtual channels using global-line communication. Symp. on High-Performance Interconnects, pages 11--20, 2008.

Digital Library

[18]

C. E. Leiserson, Z. S. Abuhamdeh, D. C. Douglas, C. R. Feynman, M. N. Ganmukhi, J. V. Hill, D. Hillis, B. C. Kuszmaul, M. A. St. Pierre, D. S. Wells, M. C. Wong, S.-W. Yang, and R. Zak. The network architecture of the Connection Machine CM-5. In fourth annual ACM Symp. on Parallel Algorithms and Architectures, pages 272--285, 1992.

Digital Library

[19]

Microwave Office. Applied Wave Research, El Segundo, CA, USA, 2010.

[20]

R. Nanjegowda, O. Hernandez, B. Chapman, and H. H. Jin. Scalability evaluation of barrier algorithms for openmp. In 5th Int'l Workshop on OpenMP: Evolving OpenMP in an Age of Extreme Parallelism, pages 42--52, 2009.

Digital Library

[21]

T. Peters. Livermore loops coded in c, 1992. http://www.netlib.org/benchmark/livermorec.

[22]

J. Renau, B. Fraguela, J. Tuck, W. Liu, M. Prvulovic, L. Ceze, S. Sarangi, P. Sack, K. Strauss, and P. Montesinos. SESC simulator, 2005. http://sesc.sourceforge.net.

[23]

J. Sampson, R. Gonzalez, J.-F. Collard, N. P. Jouppi, M. Schlansker, and B. Calder. Exploiting fine-grained data parallelism with chip multiprocessors and fast barriers. In 39th IEEE/ACM Int'l Symp. on Microarchitecture, pages 235--246, 2006.

Digital Library

[24]

J. Sartori and R. Kumar. Low-overhead, high-speed multi-core barrier synchronization. In 5th Intl. Conf. on High Performance Embedded Architectures and Compilers, pages 18--34, 2010.

Digital Library

[25]

O. Schmitz, S. Hampel, C. Orlob, M. Tiebout, and I. Rolfes. Body effect up- and down-conversion mixer circuits for low-voltage ultra-wideband operation. Analog Integrated Circuits and Signal Processing, 64:233--240, 2010.

Digital Library

[26]

S. L. Scott. Synchronization and communication in the T3E multiprocessor. In seventh Intl. Conf. on Arch. Support for Prog. Lang. and Operating Sys., pages 26--36, 1996.

Digital Library

[27]

S. Shang and K. Hwang. Distributed hardwired barrier synchronization for scalable multiprocessor clusters. IEEE Trans. on Parallel and Distributed Systems, 6:591--605, 1995.

Digital Library

[28]

M. Sinha, S. Hsu, A. Alvandpour, W. Burleson, R. Krishnamurthy, and S. Borkar. High-performance and low-voltage sense-amplifier techniques for sub-90nm SRAM. In IEEE Intl. Conf. on Systems-on-Chip, pages 113--116, 2003.

[29]

Y. Sun, C. Jeong, I. Lee, J. Lee, and S. Lee. A 50-300-MHz low power and high linear active RF tracking filter for digital TV tuner ICs. In 2010 IEEE Custom Integrated Circuits Conf. (CICC), pages 1--4, 2010.

[30]

A. Valdes-Garcia, R. Venkatasubramanian, R. Srinivasan, J. Silva-Martinez, and E. Sanchez-Sinencio. A CMOS RF RMS detector for built-in testing of wireless transceivers. In Proc. of 23rd IEEE VLSI Test Symp., pages 249--254, 2005.

Digital Library

[31]

Y. Zheng and C. Saavedra. Ultra-compact MMIC active bandpass filter with wide tuning range. Electronics Letters, 44(6):424--425, 2008.

Cited By

Soria-Pardos VArmejach AMück TSuárez-Gracia DJoao JRico AMoretó MSolihin YHeinrich M(2023)DynAMO: Improving Parallelism Through Dynamic Placement of Atomic Memory OperationsProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589065(1-13)Online publication date: 17-Jun-2023
https://dl.acm.org/doi/10.1145/3579371.3589065
Franques AKokolis AAbadal SFernando VMisailovic STorrellas J(2021)WiDir: A Wireless-Enabled Directory Cache Coherence Protocol2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA51647.2021.00034(304-317)Online publication date: Feb-2021
https://doi.org/10.1109/HPCA51647.2021.00034
Giannoula CVijaykumar NPapadopoulou NKarakostas VFernandez IGomez-Luna JOrosa LKoziris NGoumas GMutlu O(2021)SynCron: Efficient Synchronization Support for Near-Data-Processing Architectures2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA51647.2021.00031(263-276)Online publication date: Feb-2021
https://doi.org/10.1109/HPCA51647.2021.00031
Show More Cited By

Index Terms

TLSync: support for multiple fast barriers using on-chip transmission lines
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Multiple instruction, multiple data

Recommendations

TLSync: support for multiple fast barriers using on-chip transmission lines
ISCA '11

As the number of cores on a single-chip grows, scalable barrier synchronization becomes increasingly difficult to implement. In software implementations, such as the tournament barrier, a larger number of cores results in a longer latency for each round ...
Efficiency and scalability of barrier synchronization on NoC based many-core architectures
CASES '08: Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems

Interconnects based on Networks-on-Chip are an appealing solution to address future microprocessor designs where, very likely, hundreds of cores will be connected on a single chip. A fundamental role in highly parallelized applications running on many-...
A new patch antenna for UMTS band: narrow compact CPW-fed monopole
TELE-INFO'06: Proceedings of the 5th WSEAS international conference on Telecommunications and informatics

A novel narrow compact CPW-fed rectangular monopole patch antenna is proposed for UMTS band. The purpose is to design a new narrow compact antenna structure by using of a new CPW feeding technique. The most significant characteristics of proposed ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ISCA '11: Proceedings of the 38th annual international symposium on Computer architecture

June 2011

488 pages

ISBN:9781450304726

DOI:10.1145/2000064

General Chairs:
Ravi Iyer
Intel
,
Qing Yang
University of Rhode Island
,
Program Chair:
Antonio González
Intel and UPC

ACM SIGARCH Computer Architecture News Volume 39, Issue 3
ISCA '11
June 2011
462 pages
ISSN:0163-5964
DOI:10.1145/2024723
Issue’s Table of Contents

Copyright © 2011 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 June 2011

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ISCA '11

Sponsor:

SIGARCH

ISCA '11: The 38th Annual International Symposium on Computer Architecture

June 4 - 8, 2011

California, San Jose, USA

Acceptance Rates

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Sponsor:
sigarch

The 52nd Annual International Symposium on Computer Architecture

June 21 - 25, 2025

Tokyo , Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

42
Total Citations
View Citations
550
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)2

Reflects downloads up to 14 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Soria-Pardos VArmejach AMück TSuárez-Gracia DJoao JRico AMoretó MSolihin YHeinrich M(2023)DynAMO: Improving Parallelism Through Dynamic Placement of Atomic Memory OperationsProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589065(1-13)Online publication date: 17-Jun-2023
https://dl.acm.org/doi/10.1145/3579371.3589065
Franques AKokolis AAbadal SFernando VMisailovic STorrellas J(2021)WiDir: A Wireless-Enabled Directory Cache Coherence Protocol2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA51647.2021.00034(304-317)Online publication date: Feb-2021
https://doi.org/10.1109/HPCA51647.2021.00034
Giannoula CVijaykumar NPapadopoulou NKarakostas VFernandez IGomez-Luna JOrosa LKoziris NGoumas GMutlu O(2021)SynCron: Efficient Synchronization Support for Near-Data-Processing Architectures2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA51647.2021.00031(263-276)Online publication date: Feb-2021
https://doi.org/10.1109/HPCA51647.2021.00031
Fu JJuyal PZajic A(2020)Modeling of 300 GHz Chip-to-Chip Wireless Channels in Metal EnclosuresIEEE Transactions on Wireless Communications10.1109/TWC.2020.297120619:5(3214-3227)Online publication date: May-2020
https://doi.org/10.1109/TWC.2020.2971206
Fernando VFranques AAbadal SMisailovic STorrellas JBahar IHerlihy MWitchel ELebeck A(2019)ReplicaProceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3297858.3304033(849-863)Online publication date: 4-Apr-2019
https://dl.acm.org/doi/10.1145/3297858.3304033
Mondal HCataldo RMissio Marcon CMartin KDeb SDiguet J(2018)Broadcast- and Power-Aware Wireless NoC for Barrier Synchronization in Parallel Computing2018 31st IEEE International System-on-Chip Conference (SOCC)10.1109/SOCC.2018.8618541(1-6)Online publication date: Sep-2018
https://doi.org/10.1109/SOCC.2018.8618541
Afoakwa RLu LWang YWu HHuang M(2018)High Swing Pulse-Amplitude Modulation of Transmission Line Links for On-Chip Communication2018 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS.2018.8351267(1-5)Online publication date: May-2018
https://doi.org/10.1109/ISCAS.2018.8351267
Jerger NKrishna TPeh L(2017)On-Chip Networks, Second EditionSynthesis Lectures on Computer Architecture10.2200/S00772ED1V01Y201704CAC04012:3(1-210)Online publication date: 17-Jun-2017
https://doi.org/10.2200/S00772ED1V01Y201704CAC040
Wei ZLiu PSun R(2017)HyBar: high efficient barrier synchronization based on a hybrid packet-circuit switching Network-on-ChipScience China Information Sciences10.1007/s11432-016-0306-y60:6Online publication date: 9-Feb-2017
https://doi.org/10.1007/s11432-016-0306-y
Ros AKaxiras SHsu WYang CLipasti MLee H(2016)RacerThe 49th Annual IEEE/ACM International Symposium on Microarchitecture10.5555/3195638.3195678(1-13)Online publication date: 15-Oct-2016
https://dl.acm.org/doi/10.5555/3195638.3195678
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten