skip to main content
10.1145/2749469.2750396acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
research-article

MiSAR: minimalistic synchronization accelerator with resource overflow management

Published:13 June 2015Publication History

ABSTRACT

While numerous hardware synchronization mechanisms have been proposed, they either no longer function or suffer great performance loss when their hardware resources are exceeded, or they add significant complexity and cost to handle such resource overflows. Additionally, prior hardware synchronization proposals focus on one type (barrier or lock) of synchronization, so several mechanisms are likely to be needed to support real applications, many of which use locks, barriers, and/or condition variables.

This paper proposes MiSAR, a minimalistic synchronization accelerator (MSA) that supports all three commonly used types of synchronization (locks, barriers, and condition variables), and a novel overflow management unit (OMU) that dynamically manages its (very) limited hardware synchronization resources. The OMU allows safe and efficient dynamic transitions between using hardware (MSA) and software synchronization implementations. This allows the MSA's resources to be used only for currently-active synchronization operations, providing significant performance benefits even when the number of synchronization variables used in the program is much larger than the MSA's resources. Because it allows a safe transition between hardware and software synchronization, the OMU also facilitates thread suspend/resume, migration, and other thread-management activities. Finally, the MSA/OMU combination decouples the instruction set support (how the program invokes hardware-supported synchronization) from the actual implementation of the accelerator, allowing different accelerators (or even wholesale removal of the accelerator) in the future without changes to OMU-compatible application or system code. We show that, even with only 2 MSA entries in each tile, the MSA/OMU combination on average performs within 3% of ideal (zero-latency) synchronization, and achieves a speedup of 1.43X over the software (pthreads) implementation.

References

  1. J. Abellán, J. Fernández, and M. Acacio, "A g-line-based network for fast and efficient barrier synchronization in many-core cmps," in Parallel Processing (ICPP), 2010 39th International Conference on, Sept 2010, pp. 267--276. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. J. Abellán, J. Fernández, and M. Acacio, "Glocks: Efficient support for highly-contended locks in many-core cmps," in Parallel Distributed Processing Symposium (IPDPS), 2011 IEEE International, may 2011, pp. 893--905. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. Agarwal, R. Bianchini, D. Chaiken, K. L. Johnson, D. Kranz, J. Kubiatowicz, B.-H. Lim, K. Mackenzie, and D. Yeung, "The mit alewife machine: architecture and performance," in Proceedings of the 22nd annual international symposium on Computer architecture, ser. ISCA '95. New York, NY, USA: ACM, 1995, pp. 2--13. Available: http://doi.acm.org/10.1145/223982.223985 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. B. S. Akgul, J. Lee, and V. J. Mooney, "A system-on-a-chip lock cache with task preemption support," in Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems, ser. CASES '01. New York, NY, USA: ACM, 2001, pp. 149--157. Available: http://doi.acm.org/10.1145/502217.502242 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. G. Almási, C. Archer, J. G. Castaños, J. A. Gunnels, C. C. Erway, P. Heidelberger, X. Martorell, J. E. Moreira, K. Pinnow, J. Ratterman, B. D. Steinmacher-Burow, W. Gropp, and B. Toonen, "Design and implementation of message-passing services for the blue gene/l supercomputer," IBM Journal of Research and Development, vol. 49, no. 2.3, pp. 393--406, march 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. R. Alverson, D. Callahan, D. Cummings, B. Koblenz, A. Porterfield, and B. Smith, "The tera computer system," in Proceedings of the 4th international conference on Supercomputing, ser. ICS '90. New York, NY, USA: ACM, 1990, pp. 1--6. Available: http://doi.acm.org/10.1145/77726.255132 Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C. J. Beckmann and C. D. Polychronopoulos, "Fast barrier synchronization hardware," in Proceedings of the 1990 ACM/IEEE conference on Supercomputing, ser. Supercomputing '90. Los Alamitos, CA, USA: IEEE Computer Society Press, 1990, pp. 180--189. Available: http://dl.acm.org/citation.cfm?id=110382.110433 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. C. Bienia, "Benchmarking modern multiprocessors," Ph.D. dissertation, Princeton University, January 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M.-C. Chiang, "Memory system design for bus-based multiprocessors," Ph.D. dissertation, Madison, WI, USA, 1992, uMI Order No. GAX92-09300. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. W. Dally and B. Towles, Principles and Practices of Interconnection Networks. Morgan Kaufmann Publishers Inc., 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Gottlieb, R. Grishman, C. P. Kruskal, K. P. McAuliffe, L. Rudolph, and M. Snir, "The nyu ultracomputer---designing a mimd, shared-memory parallel machine (extended abstract)," in Proceedings of the 9th annual symposium on Computer Architecture, ser. ISCA '82. Los Alamitos, CA, USA: IEEE Computer Society Press, 1982, pp. 27--42. Available: http://dl.acm.org/citation.cfm?id=800048.801711 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. Kägi, D. Burger, and J. R. Goodman, "Efficient synchronization: let them eat qolb," in Proceedings of the 24th annual international symposium on Computer architecture, ser. ISCA '97. New York, NY, USA: ACM, 1997, pp. 170--180. Available: http://doi.acm.org/10.1145/264107.264166 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. Keckler, W. Dally, D. Maskit, N. Carter, A. Chang, and W. Lee, "Exploiting fine-grain thread level parallelism on the mit multi-alu processor," in Computer Architecture, 1998. Proceedings. The 25th Annual International Symposium on, jun-1 jul 1998, pp. 306--317. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Laudon and D. Lenoski, "The sgi origin: a ccnuma highly scalable server," in Proceedings of the 24th annual international symposium on Computer architecture, ser. ISCA '97. New York, NY, USA: ACM, 1997, pp. 241--251. Available: http://doi.acm.org/10.1145/264107.264206 Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. C. E. Leiserson, Z. S. Abuhamdeh, D. C. Douglas, C. R. Feynman, M. N. Ganmukhi, J. V. Hill, D. Hillis, B. C. Kuszmaul, M. A. St. Pierre, D. S. Wells, M. C. Wong, S.-W. Yang, and R. Zak, "The network architecture of the connection machine cm-5 (extended abstract)," in Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures, ser. SPAA '92. New York, NY, USA: ACM, 1992, pp. 272--285. Available: http://doi.acm.org/10.1145/140901.141883 Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. M. Mellor-Crummey and M. L. Scott, "Algorithms for scalable synchronization on shared-memory multiprocessors," ACM Trans. Comput. Syst., vol. 9, no. 1, pp. 21--65, Feb. 1991. Available: http://doi.acm.org/10.1145/103727.103729 Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Oh, M. Prvulovic, and A. Zajic, "Tlsync: Support for multiple fast barriers using on-chip transmission lines," in Computer Architecture (ISCA), 2011 38th Annual International Symposium on, june 2011, pp. 105--115. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. F. Petrini, J. Fernandez, E. Frachtenberg, and S. Coll, "Scalable collective communication on the asci q machine," in High Performance Interconnects, 2003. Proceedings. 11th Symposium on, aug. 2003, pp. 54--59.Google ScholarGoogle Scholar
  19. J. Renau, B. Fraguela, J. Tuck, W. Liu, M. Prvulovic, L. Ceze, S. Sarangi, P. Sack, K. Strauss, and P. Montesinos, "Sesc simulator, january 2005."Google ScholarGoogle Scholar
  20. J. T. Robinson, "A fast general-purpose hardware synchronization mechanism," in Proceedings of the 1985 ACM SIGMOD international conference on Management of data, ser. SIGMOD '85. New York, NY, USA: ACM, 1985, pp. 122--130. Available: http://doi.acm.org/10.1145/318898.318910 Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. Sampson, R. González, J.-F. Collard, N. P. Jouppi, M. Schlansker, and B. Calder, "Exploiting fine-grained data parallelism with chip multiprocessors and fast barriers," in Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO 39. Washington, DC, USA: IEEE Computer Society, 2006, pp. 235--246. Available: http://dx.doi.org.www.library.gatech.edu:2048/10.1109/MICRO.2006.23 Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. S. L. Scott, "Synchronization and communication in the t3e multiprocessor," in Proceedings of the seventh international conference on Architectural support for programming languages and operating systems, ser. ASPLOS-VII. New York, NY, USA: ACM, 1996, pp. 26--36. Available: http://doi.acm.org/10.1145/237090.237144 Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. E. Vallejo, R. Beivide, A. Cristal, T. Harris, F. Vallejo, O. Unsal, and M. Valero, "Architectural support for fair reader-writer locking," in Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO '43. Washington, DC, USA: IEEE Computer Society, 2010, pp. 275--286. Available: http://dx.doi.org/10.1109/MICRO.2010.12 Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta, "The splash-2 programs: characterization and methodological considerations," in Proceedings of the 22nd annual international symposium on Computer architecture, ser. ISCA '95. New York, NY, USA: ACM, 1995, pp. 24--36. Available: http://doi.acm.org/10.1145/223982.223990 Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. L. Zhang, Z. Fang, and J. Carter, "Highly efficient synchronization based on active memory operations," in Parallel and Distributed Processing Symposium, 2004. Proceedings. 18th International, april 2004, p. 58.Google ScholarGoogle Scholar
  26. W. Zhu, V. C. Sreedhar, Z. Hu, and G. R. Gao, "Synchronization state buffer: supporting efficient fine-grain synchronization on many-core architectures," in Proceedings of the 34th annual international symposium on Computer architecture, ser. ISCA '07. New York, NY, USA: ACM, 2007, pp. 35--45. Available: http://doi.acm.org/10.1145/1250662.1250668 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. MiSAR: minimalistic synchronization accelerator with resource overflow management

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            ISCA '15: Proceedings of the 42nd Annual International Symposium on Computer Architecture
            June 2015
            768 pages
            ISBN:9781450334020
            DOI:10.1145/2749469

            Copyright © 2015 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 13 June 2015

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            Overall Acceptance Rate543of3,203submissions,17%

            Upcoming Conference

            ISCA '24

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader