skip to main content
10.1145/2618128.2618130acmconferencesArticle/Chapter ViewAbstractPublication PagesmspConference Proceedingsconference-collections
research-article

O-structures: semantics for versioned memory

Published: 13 June 2014 Publication History

Abstract

This paper introduces O-structures, a novel architectural memory element that can be used to facilitate parallelism in task-based execution models. Much like register renaming, each write to an O-structure creates a new version of program memory at that location. These versions can be accessed concurrently and out of program order. O-structures provide a set of semantics that match the needs of task-based execution models, specifically allowing tasks to synchronize on specific versions of memory as well as coordinate access when the necessary version is not known at compile time.
In this work, we describe O-structures and provide their complete semantics. We also discuss how a task-based execution of basic data structure manipulations on common data structures (arrays, lists, trees, etc) operate. Results are presented that measure the exposed memory-level parallelism (MLP) in these operations. We find that for previously difficult to parallelize data-structures, such as linked lists, binary trees and sparse-matrix codes we see significant memory level parallelism (50--100 operations per cycle) when using O-structures.

References

[1]
M. D. Allen, S. Sridharan, and G. S. Sohi, "Serialization sets: a dynamic dependence-based parallel execution model," in PPoPP, 2008.
[2]
R. Alverson, D. Callahan, D. Cummings, B. Koblenz, A. Porterfield, and B. Smith, "The Tera Computer System," in Intl. Conf. on Supercomputing (ICS), Jun 1990.
[3]
Arvind, R. S. Nikhil, and K. K. Pingali, "I-structures: data structures for parallel computing," TOPLAS, vol. 11, no. 4, Oct. 1989.
[4]
E. Ayguadé, N. Copty, A. Duran, J. Hoeflinger, Y. Lin, F. Massaioli, X. Teruel, P. Unnikrishnan, and G. Zhang, "The design of OpenMP tasks," TPDS, vol. 20, no. 3, 2009.
[5]
P. S. Barth, R. S. Nikhil, and Arvind, "M-Structures: Extending a parallel, non-strict, functional language with state," in Functional Programming Languages and Computer Architecture, 1991.
[6]
P. Bellens, J. Perez, R. Badia, and J. Labarta, "CellSs: a programming model for the Cell BE architecture," SC, Nov 2006.
[7]
R. Blumofe, M. Frigo, C. Joerg, C. Leiserson, and K. Randall, "DAG-consistent distributed shared memory," in IPPS, Apr 1996.
[8]
J. Carter, W. Hsieh, L. Stoller, M. Swanson, L. Zhang, E. Brunvand, A. Davis, C.-C. Kuo, R. Kuramkote, M. Parker, L. Schaelicke, and T. Tateyama, "Impulse: Building a smarter memory controller," in HPCA, 1999. {Online}. Available: http://dl.acm.org/citation.cfm?id=520549.822749
[9]
D. E. Culler, K. E. Schauser, and T. von Eicken, "Two fundamental limits on dataflow multiprocessing," in PACT, 1993.
[10]
W. J. Dally, A. Chien, S. Fiske, W. Horwat, J. Keen, M. Larivee, R. Lethin, P. Nuth, S. Wills, P. Carrick, and G. Fyler, "The J-Machine: A fine-grain parallel computer," in Computing Systems in Engineering, 1992.
[11]
J. B. Dennis and D. P. Misunas, "A preliminary architecture for a basic data-flow processor," in ISCA, 1975.
[12]
Y. Etsion, F. Cabarcas, A. Rico, A. Ramirez, R. M. Badia, E. Ayguade, J. Labarta, and M. Valero, "Task superscalar: An out-of-order task pipeline," in MICRO, Dec 2010.
[13]
M. Frigo, C. E. Leiserson, and K. H. Randall, "The implementation of the Cilk-5 multithreaded language," in PLDI, 1998.
[14]
S. Gopal, T. Vijaykumar, J. Smith, and G. Sohi, "Speculative versioning cache," in HPCA, 1998.
[15]
A. Gottlieb, R. Grishman, C. P. Kruskal, K. P. McAuliffe, L. Rudolph, and M. Snir, "The NYU Ultracomputer - Designing an MIMD Shared Memory Parallel Computer," in IEEE Trans. on Computers, Feb 1982.
[16]
G. Gupta, S. Sridharan, and G. S. Sohi, "The road to parallelism leads through sequential programming," in HotPar, Jun 2012.
[17]
M. Hall, P. Kogge, J. Koller, P. Diniz, J. Chame, J. Draper, J. LaCoss, J. Granacki, J. Brockman, A. Srivastava, W. Athas, V. Freeh, J. Shin, and J. Park, "Mapping irregular applications to DIVA, a PIM-based data-intensive architecture," in Supercomputing, 1999.
[18]
Y. Kang, M. Huang, S.-M. Yoo, Z. Ge, D. Keen, V. Lam, P. Pattnaik, and J. Torrellas, "FlexRAM: Toward an Advanced Intelligent Memory System," in Intl. Conf. on Computer Design, Oct 1999.
[19]
C. E. Kozyrakis, S. Perissakis, D. Patterson, T. Anderson, K. Asanović, N. Cardwell, R. Fromm, J. Golbus, B. Gribstad, K. Keeton, R. Thomas, N. Treuhaft, and K. Yelick, "Scalable processors in the billion-transistor era: IRAM," IEEE Computer, vol. 30, no. 9, 1997.
[20]
V. Krishnan and J. Torrellas, "A chip-multiprocessor architecture with speculative multithreading," IEEE Trans. on Computers, vol. 48, no. 9, Sep 1999.
[21]
OpenMP Application Program Interface Version 4.0, OpenMP Architecture Review Board, Jul 2013.
[22]
J. Oplinger, D. Heine, S.-W. Liao, B. A. Nayfeh, M. S. Lam, and K. Olukotun, "Software and hardware for exploiting speculative parallelism with a multiprocessor," Stanford Univ., Tech. Rep., 1997.
[23]
E. I. Organick, A programmer's view of the Intel 432 system. New York, NY, USA: McGraw-Hill, Inc., 1983.
[24]
M. Oskin, F. T. Chong, and T. Sherwood, "Active pages: a computation model for intelligent memory," in ISCA, 1998.
[25]
J. Reinders, Intel Threading Building Blocks: Outfitting C++ for Multi-Core Processor Parallelism, Jul 2007.
[26]
J. Renau, J. Tuck, W. Liu, L. Ceze, K. Strauss, and J. Torrellas, "Tasking with Out-of-Order Spawn in TLS Chip Multiprocessors: Microarchitecture and Compilation," in Intl. Conf. on Supercomputing (ICS), Jun 2005.
[27]
M. C. Rinard, D. J. Scales, and M. S. Lam, "Jade: A high-level, machine-independent language for parallel programming," IEEE Computer, vol. 26, 1993.
[28]
J. Sanders and E. Kandrot, CUDA by Example: An Introduction to General-Purpose GPU Programming. Addison-Wesley, 2010.
[29]
C. Segulja and T. Abdelrahman, "Architectural support for synchronization-free deterministic parallel programming," in HPCA, 2012.
[30]
G. S. Sohi, S. E. Breach, and T. N. Vijaykumar, "Multiscalar processors," in ISCA, 1995.
[31]
S. Swanson, K. Michelson, A. Schwerin, and M. Oskin, "WaveScalar," in MICRO, Dec 2003.

Cited By

View all
  • (2018)Architectural Support for Unlimited Memory Versioning and Renaming2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS.2018.00023(126-136)Online publication date: May-2018
  • (2017)Towards a Deterministic Fine-Grained Task Ordering Using Multi-Versioned Memory2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)10.1109/SBAC-PAD.2017.21(105-112)Online publication date: Oct-2017

Index Terms

  1. O-structures: semantics for versioned memory

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MSPC '14: Proceedings of the workshop on Memory Systems Performance and Correctness
    June 2014
    61 pages
    ISBN:9781450329170
    DOI:10.1145/2618128
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 June 2014

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. O-structures
    2. memory renaming
    3. out-of-order

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    PLDI '14
    Sponsor:

    Acceptance Rates

    MSPC '14 Paper Acceptance Rate 6 of 20 submissions, 30%;
    Overall Acceptance Rate 6 of 20 submissions, 30%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)3
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 13 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2018)Architectural Support for Unlimited Memory Versioning and Renaming2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS.2018.00023(126-136)Online publication date: May-2018
    • (2017)Towards a Deterministic Fine-Grained Task Ordering Using Multi-Versioned Memory2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)10.1109/SBAC-PAD.2017.21(105-112)Online publication date: Oct-2017

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media