research-article

A unifying abstraction for data structure splicing

Authors:

Alexandra FedorovaAuthors Info & Claims

MEMSYS '19: Proceedings of the International Symposium on Memory Systems

Pages 173 - 183

https://doi.org/10.1145/3357526.3357548

Published: 30 September 2019 Publication History

Abstract

Data structure splicing (DSS) refers to reorganizing data structures by merging or splitting them, reordering fields, inlining pointers, etc. DSS has been used, with demonstrated benefits, to improve spatial locality. When data fields that are accessed together are also collocated in the address space, the utilization of hardware caches improves and cache misses decline.

A number of approaches to DSS have been proposed, but each addressed only one or two splicing optimizations (e.g., only splitting or only field reordering) and used an underlying abstraction that could not be extended to include others. Our work proposes a single abstraction, called Data Structure Access Graph (D-SAG), that (a) covers all data-splicing optimizations proposed previously and (b) unlocks new ones. Having a common abstraction has two benefits: (1) It enables us to build a single tool that hosts all DSS optimizations under one roof, eliminating the need to adopt multiple tools. (2) It avoids conflicts: e.g., where one tool suggests to split a data structure in a way that would conflict with another tool's suggestion to reorder fields.

Based on the D-SAG abstraction, we build a toolchain that uses static and dynamic analysis to recommend DSS optimizations to developers. Using this tool, we identify ten benchmarks from the SPEC CPU2017 and PARSEC suites that are amenable to DSS, as well as a workload on RocksDB that stresses its memory table. Restructuring data structures following the tool's suggestion improves performance by an average of 11% (geomean) and reduces cache misses by an average of 28% (geomean) for seven of these workloads.

References

[1]

2019. perf: Linux profiling with performance counters. https://perf.wiki.kernel.org/

[2]

2019. Performance Analysis Guide for Intel® CoreTM i7 Processor and Intel® XeonTM 5500 processors. https://software.intel.com/sites/products/collateral/hpc/vtune/performance_analysis_guide.pdf

[3]

2019. RocksDB | A persistent key-value store. https://rocksdb.org/

[4]

2019. Standard Performance Evaluation Corporation. https://www.spec.org/

[5]

Christian Bienia. 2011. Benchmarking Modern Multiprocessors. Ph.D. Dissertation. Princeton University.

[6]

Vincent Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre. 2008. Fast Unfolding of Communities in Large Networks. Journal of Statistical Mechanics Theory and Experiment 2008 (04 2008).

[7]

Trishul M. Chilimbi, Bob Davidson, and James R. Larus. 1999. Cache-conscious Structure Definition. In Proceedings of the ACM SIGPLAN 1999 Conference on Programming Language Design and Implementation (PLDI '99). ACM, New York, NY, USA, 13--24.

Digital Library

[8]

Julian Dolby and Andrew Chien. 2000. An Automatic Object Inlining Optimization and Its Evaluation. In Proceedings of the ACM SIGPLAN 2000 Conference on Programming Language Design and Implementation (PLDI '00). ACM, New York, NY, USA, 345--357.

Digital Library

[9]

Jan Edler and Mark D. Hill. 1998. Dinero IV: trace-driven uniprocessor cache simulator.

[10]

Taees Eimouri, Kenneth B. Kent, Aleksandar Micic, and Karl Taylor. 2016. Using Field Access Frequency to Optimize Layout of Objects in the JVM. In Proceedings of the 31st Annual ACM Symposium on Applied Computing (SAC '16). ACM, New York, NY, USA, 1815--1818.

Digital Library

[11]

Michael Ferdman, Almutaz Adileh, Onur Kocberber, Stavros Volos, Mohammad Alisafaee, Djordje Jevdjic, Cansu Kaynak, Adrian Daniel Popescu, Anastasia Ailamaki, and Babak Falsafi. 2012. Clearing the Clouds: A Study of Emerging Scale-out Workloads on Modern Hardware. In Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XVII). ACM, New York, NY, USA, 37--48.

Digital Library

[12]

Michael R. Garey and David S. Johnson. 1979. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., New York, NY, USA.

[13]

Robert Hundt, Sandya Mannarswamy, and Dhruva Chakrabarti. 2006. Practical Structure Layout Optimization and Advice. In Proceedings of the International Symposium on Code Generation and Optimization (CGO '06). IEEE Computer Society, Washington, DC, USA, 233--244.

Digital Library

[14]

S. Kumar, H. Zhao, A. Shriraman, E. Matthews, S. Dwarkadas, and L. Shannon. 2012. Amoeba-Cache: Adaptive Blocks for Eliminating Waste in the Memory Hierarchy. In 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture. 376--388.

Digital Library

[15]

Rahman Lavaee. 2016. The Hardness of Data Packing. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL '16). ACM, New York, NY, USA, 232--242.

Digital Library

[16]

Jin Lin and Pen-Chung Yew. 2010. A Compiler Framework for General Memory Layout Optimizations Targeting Structures. In Proceedings of the 2010 Workshop on Interaction Between Compilers and Computer Architecture (INTERACT-14). ACM, New York, NY, USA, Article 5, 8 pages.

Digital Library

[17]

R. L. Mattson, J. Gecsei, D. R. Slutz, and I. L. Traiger. 1970. Evaluation Techniques for Storage Hierarchies. IBM Syst. J. 9, 2 (June 1970), 78--117.

Digital Library

[18]

Michael J. Eager. 2012. Introduction to the DWARF Debugging Format. http://www.dwarfstd.org/doc/Debugging%20using%20DWARF-2012.pdf

[19]

Svetozar Miucin, Conor Brady, and Alexandra Fedorova. 2016. End-to-end Memory Behavior Profiling with DINAMITE. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2016). ACM, New York, NY, USA, 1042--1046.

Digital Library

[20]

Svetozar Miucin and Alexandra Fedorova. 2018. Data-driven Spatial Locality (Memsys 2018). ACM, New York, NY, USA.

[21]

M. E. J. Newman. 2006. Modularity and community structure in networks. Proceedings of the National Academy of Sciences 103, 23 (2006), 8577--8582. arXiv:https://www.pnas.org/content/103/23/8577.full.pdf

[22]

Moinuddin K Qureshi, M Aater Suleman, and Yale N Patt. 2007. Line distillation: Increasing cache capacity by filtering unused words in cache lines. In 2007 IEEE 13th International Symposium on High Performance Computer Architecture (HPCA).

Digital Library

[23]

Peng Zhao, Shimin Cui, Yaoqing Gao, Raúl Silvera, and José Amaral. 2005. Forma: A framework for safe automatic array reshaping. ACM Transactions on Programming Languages and Systems (TOPLAS) 30 (01 2005), 2.

Digital Library

[24]

Yutao Zhong, Maksim Orlovich, Xipeng Shen, and Chen Ding. 2004. Array Regrouping and Structure Splitting Using Whole-program Reference Affinity. In Proceedings of the ACM SIGPLAN 2004 Conference on Programming Language Design and Implementation (PLDI '04). ACM, New York, NY, USA, 255--266.

Digital Library

Cited By

Salvador Rohwedder CL. De Carvalho JAmaral JRodríguez GSadayappan PSukumaran-Rajam A(2024)Region-Based Data Layout via Data Reuse AnalysisProceedings of the 33rd ACM SIGPLAN International Conference on Compiler Construction10.1145/3640537.3641571(49-59)Online publication date: 17-Feb-2024
https://dl.acm.org/doi/10.1145/3640537.3641571
McMichen TGreiner NZhong PSossai FPatel ACampanoni SGrosser TDubach CSteuwer MXue JOttoni GQuintão Pereira F(2024)Representing Data Collections in an SSA FormProceedings of the 2024 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO57630.2024.10444817(308-321)Online publication date: 2-Mar-2024
https://dl.acm.org/doi/10.1109/CGO57630.2024.10444817
Ning ZGu NSu JQi D(2022)STAFF: A Model for Structure Layout Optimization2022 7th International Conference on Computer and Communication Systems (ICCCS)10.1109/ICCCS55155.2022.9846314(115-122)Online publication date: 22-Apr-2022
https://doi.org/10.1109/ICCCS55155.2022.9846314
Show More Cited By

Index Terms

A unifying abstraction for data structure splicing

Recommendations

Data-driven spatial locality
MEMSYS '18: Proceedings of the International Symposium on Memory Systems

Researchers and practitioners dedicate a lot of effort to improving spatial locality in their programs. Hardware caches rely on spatial locality for efficient operation; when it is absent, they waste memory bandwidth and cache space by fetching data ...
Data access history cache and associated data prefetching mechanisms
SC '07: Proceedings of the 2007 ACM/IEEE conference on Supercomputing

Data prefetching is an effective way to bridge the increasing performance gap between processor and memory. As computing power is increasing much faster than memory performance, we suggest that it is time to have a dedicated cache to store data access ...
The Impact of Parallel Loop Scheduling Strategies on Prefetching in a Shared Memory Multiprocessor

Trace-driven simulations of numerical Fortran programs are used to study the impact ofthe parallel loop scheduling strategy on data prefetching in a shared memorymultiprocessor with private data caches. The simulations indicate that to maximizememory ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

MEMSYS '19: Proceedings of the International Symposium on Memory Systems

September 2019

517 pages

ISBN:9781450372060

DOI:10.1145/3357526

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 September 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MEMSYS '19

MEMSYS '19: The International Symposium on Memory Systems

September 30 - October 3, 2019

District of Columbia, Washington, USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
136
Total Downloads

Downloads (Last 12 months)10
Downloads (Last 6 weeks)2

Reflects downloads up to 30 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Salvador Rohwedder CL. De Carvalho JAmaral JRodríguez GSadayappan PSukumaran-Rajam A(2024)Region-Based Data Layout via Data Reuse AnalysisProceedings of the 33rd ACM SIGPLAN International Conference on Compiler Construction10.1145/3640537.3641571(49-59)Online publication date: 17-Feb-2024
https://dl.acm.org/doi/10.1145/3640537.3641571
McMichen TGreiner NZhong PSossai FPatel ACampanoni SGrosser TDubach CSteuwer MXue JOttoni GQuintão Pereira F(2024)Representing Data Collections in an SSA FormProceedings of the 2024 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO57630.2024.10444817(308-321)Online publication date: 2-Mar-2024
https://dl.acm.org/doi/10.1109/CGO57630.2024.10444817
Ning ZGu NSu JQi D(2022)STAFF: A Model for Structure Layout Optimization2022 7th International Conference on Computer and Communication Systems (ICCCS)10.1109/ICCCS55155.2022.9846314(115-122)Online publication date: 22-Apr-2022
https://doi.org/10.1109/ICCCS55155.2022.9846314
Akiyama SShioya RBourcier JJiang ZBezemer CCortellessa V(2021)The Granularity Gap Problem: A Hurdle for Applying Approximate Memory to Complex Data LayoutProceedings of the ACM/SPEC International Conference on Performance Engineering10.1145/3427921.3450259(125-132)Online publication date: 9-Apr-2021
https://dl.acm.org/doi/10.1145/3427921.3450259

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten