skip to main content
10.1145/3357526.3357548acmotherconferencesArticle/Chapter ViewAbstractPublication PagesmemsysConference Proceedingsconference-collections
research-article

A unifying abstraction for data structure splicing

Published: 30 September 2019 Publication History

Abstract

Data structure splicing (DSS) refers to reorganizing data structures by merging or splitting them, reordering fields, inlining pointers, etc. DSS has been used, with demonstrated benefits, to improve spatial locality. When data fields that are accessed together are also collocated in the address space, the utilization of hardware caches improves and cache misses decline.
A number of approaches to DSS have been proposed, but each addressed only one or two splicing optimizations (e.g., only splitting or only field reordering) and used an underlying abstraction that could not be extended to include others. Our work proposes a single abstraction, called Data Structure Access Graph (D-SAG), that (a) covers all data-splicing optimizations proposed previously and (b) unlocks new ones. Having a common abstraction has two benefits: (1) It enables us to build a single tool that hosts all DSS optimizations under one roof, eliminating the need to adopt multiple tools. (2) It avoids conflicts: e.g., where one tool suggests to split a data structure in a way that would conflict with another tool's suggestion to reorder fields.
Based on the D-SAG abstraction, we build a toolchain that uses static and dynamic analysis to recommend DSS optimizations to developers. Using this tool, we identify ten benchmarks from the SPEC CPU2017 and PARSEC suites that are amenable to DSS, as well as a workload on RocksDB that stresses its memory table. Restructuring data structures following the tool's suggestion improves performance by an average of 11% (geomean) and reduces cache misses by an average of 28% (geomean) for seven of these workloads.

References

[1]
2019. perf: Linux profiling with performance counters. https://perf.wiki.kernel.org/
[2]
2019. Performance Analysis Guide for Intel® CoreTM i7 Processor and Intel® XeonTM 5500 processors. https://software.intel.com/sites/products/collateral/hpc/vtune/performance_analysis_guide.pdf
[3]
2019. RocksDB | A persistent key-value store. https://rocksdb.org/
[4]
2019. Standard Performance Evaluation Corporation. https://www.spec.org/
[5]
Christian Bienia. 2011. Benchmarking Modern Multiprocessors. Ph.D. Dissertation. Princeton University.
[6]
Vincent Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre. 2008. Fast Unfolding of Communities in Large Networks. Journal of Statistical Mechanics Theory and Experiment 2008 (04 2008).
[7]
Trishul M. Chilimbi, Bob Davidson, and James R. Larus. 1999. Cache-conscious Structure Definition. In Proceedings of the ACM SIGPLAN 1999 Conference on Programming Language Design and Implementation (PLDI '99). ACM, New York, NY, USA, 13--24.
[8]
Julian Dolby and Andrew Chien. 2000. An Automatic Object Inlining Optimization and Its Evaluation. In Proceedings of the ACM SIGPLAN 2000 Conference on Programming Language Design and Implementation (PLDI '00). ACM, New York, NY, USA, 345--357.
[9]
Jan Edler and Mark D. Hill. 1998. Dinero IV: trace-driven uniprocessor cache simulator.
[10]
Taees Eimouri, Kenneth B. Kent, Aleksandar Micic, and Karl Taylor. 2016. Using Field Access Frequency to Optimize Layout of Objects in the JVM. In Proceedings of the 31st Annual ACM Symposium on Applied Computing (SAC '16). ACM, New York, NY, USA, 1815--1818.
[11]
Michael Ferdman, Almutaz Adileh, Onur Kocberber, Stavros Volos, Mohammad Alisafaee, Djordje Jevdjic, Cansu Kaynak, Adrian Daniel Popescu, Anastasia Ailamaki, and Babak Falsafi. 2012. Clearing the Clouds: A Study of Emerging Scale-out Workloads on Modern Hardware. In Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XVII). ACM, New York, NY, USA, 37--48.
[12]
Michael R. Garey and David S. Johnson. 1979. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., New York, NY, USA.
[13]
Robert Hundt, Sandya Mannarswamy, and Dhruva Chakrabarti. 2006. Practical Structure Layout Optimization and Advice. In Proceedings of the International Symposium on Code Generation and Optimization (CGO '06). IEEE Computer Society, Washington, DC, USA, 233--244.
[14]
S. Kumar, H. Zhao, A. Shriraman, E. Matthews, S. Dwarkadas, and L. Shannon. 2012. Amoeba-Cache: Adaptive Blocks for Eliminating Waste in the Memory Hierarchy. In 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture. 376--388.
[15]
Rahman Lavaee. 2016. The Hardness of Data Packing. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL '16). ACM, New York, NY, USA, 232--242.
[16]
Jin Lin and Pen-Chung Yew. 2010. A Compiler Framework for General Memory Layout Optimizations Targeting Structures. In Proceedings of the 2010 Workshop on Interaction Between Compilers and Computer Architecture (INTERACT-14). ACM, New York, NY, USA, Article 5, 8 pages.
[17]
R. L. Mattson, J. Gecsei, D. R. Slutz, and I. L. Traiger. 1970. Evaluation Techniques for Storage Hierarchies. IBM Syst. J. 9, 2 (June 1970), 78--117.
[18]
Michael J. Eager. 2012. Introduction to the DWARF Debugging Format. http://www.dwarfstd.org/doc/Debugging%20using%20DWARF-2012.pdf
[19]
Svetozar Miucin, Conor Brady, and Alexandra Fedorova. 2016. End-to-end Memory Behavior Profiling with DINAMITE. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2016). ACM, New York, NY, USA, 1042--1046.
[20]
Svetozar Miucin and Alexandra Fedorova. 2018. Data-driven Spatial Locality (Memsys 2018). ACM, New York, NY, USA.
[21]
M. E. J. Newman. 2006. Modularity and community structure in networks. Proceedings of the National Academy of Sciences 103, 23 (2006), 8577--8582. arXiv:https://www.pnas.org/content/103/23/8577.full.pdf
[22]
Moinuddin K Qureshi, M Aater Suleman, and Yale N Patt. 2007. Line distillation: Increasing cache capacity by filtering unused words in cache lines. In 2007 IEEE 13th International Symposium on High Performance Computer Architecture (HPCA).
[23]
Peng Zhao, Shimin Cui, Yaoqing Gao, Raúl Silvera, and José Amaral. 2005. Forma: A framework for safe automatic array reshaping. ACM Transactions on Programming Languages and Systems (TOPLAS) 30 (01 2005), 2.
[24]
Yutao Zhong, Maksim Orlovich, Xipeng Shen, and Chen Ding. 2004. Array Regrouping and Structure Splitting Using Whole-program Reference Affinity. In Proceedings of the ACM SIGPLAN 2004 Conference on Programming Language Design and Implementation (PLDI '04). ACM, New York, NY, USA, 255--266.

Cited By

View all
  • (2024)Region-Based Data Layout via Data Reuse AnalysisProceedings of the 33rd ACM SIGPLAN International Conference on Compiler Construction10.1145/3640537.3641571(49-59)Online publication date: 17-Feb-2024
  • (2024)Representing Data Collections in an SSA FormProceedings of the 2024 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO57630.2024.10444817(308-321)Online publication date: 2-Mar-2024
  • (2022)STAFF: A Model for Structure Layout Optimization2022 7th International Conference on Computer and Communication Systems (ICCCS)10.1109/ICCCS55155.2022.9846314(115-122)Online publication date: 22-Apr-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
MEMSYS '19: Proceedings of the International Symposium on Memory Systems
September 2019
517 pages
ISBN:9781450372060
DOI:10.1145/3357526
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 September 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. CPU cache
  2. data structure design and analysis
  3. memory performance

Qualifiers

  • Research-article

Conference

MEMSYS '19
MEMSYS '19: The International Symposium on Memory Systems
September 30 - October 3, 2019
District of Columbia, Washington, USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)10
  • Downloads (Last 6 weeks)2
Reflects downloads up to 30 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Region-Based Data Layout via Data Reuse AnalysisProceedings of the 33rd ACM SIGPLAN International Conference on Compiler Construction10.1145/3640537.3641571(49-59)Online publication date: 17-Feb-2024
  • (2024)Representing Data Collections in an SSA FormProceedings of the 2024 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO57630.2024.10444817(308-321)Online publication date: 2-Mar-2024
  • (2022)STAFF: A Model for Structure Layout Optimization2022 7th International Conference on Computer and Communication Systems (ICCCS)10.1109/ICCCS55155.2022.9846314(115-122)Online publication date: 22-Apr-2022
  • (2021)The Granularity Gap Problem: A Hurdle for Applying Approximate Memory to Complex Data LayoutProceedings of the ACM/SPEC International Conference on Performance Engineering10.1145/3427921.3450259(125-132)Online publication date: 9-Apr-2021

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media