research-article

Exploiting cache locality to speedup register clustering

Authors:
Tiago Augusto Fontana

Federal University of Santa Catarina (UFSC), Florianópolis, Brazil

Federal University of Santa Catarina (UFSC), Florianópolis, Brazil
View Profile

,
Sheiny Almeida

Federal University of Santa Catarina (UFSC), Florianópolis, Brazil

Federal University of Santa Catarina (UFSC), Florianópolis, Brazil
View Profile

,
Renan Netto

Federal University of Santa Catarina (UFSC), Florianópolis, Brazil

Federal University of Santa Catarina (UFSC), Florianópolis, Brazil
View Profile

,
Vinicius Livramento

Federal University of Santa Catarina (UFSC), Florianópolis, Brazil

Federal University of Santa Catarina (UFSC), Florianópolis, Brazil
View Profile

,
Chrystian Guth

Federal University of Santa Catarina (UFSC), Florianópolis, Brazil

Federal University of Santa Catarina (UFSC), Florianópolis, Brazil
View Profile

,
Laércio Pilla

Federal University of Santa Catarina (UFSC), Florianópolis, Brazil

Federal University of Santa Catarina (UFSC), Florianópolis, Brazil
View Profile

,
José Luis Güntzel

Federal University of Santa Catarina (UFSC), Florianópolis, Brazil

Federal University of Santa Catarina (UFSC), Florianópolis, Brazil
View Profile

SBCCI '17: Proceedings of the 30th Symposium on Integrated Circuits and Systems Design: Chip on the SandsAugust 2017Pages 191–197https://doi.org/10.1145/3109984.3110005

Published:28 August 2017Publication History

SBCCI '17: Proceedings of the 30th Symposium on Integrated Circuits and Systems Design: Chip on the Sands

Pages 191–197

ABSTRACT

Physical design tools must handle huge amounts of data in order to solve problems for circuits with millions of cells. Traditionally, Electronic Design Automation tools are implemented using Object-Oriented Design. However, using this paradigm may lead to overly complex objects that result in waste of cache memory space. This memory wasting harms cache locality exploration and, consequently, degrades software runtime. This work proposes applying Data-Oriented Design on the register clustering problem. Differently from the traditional Object-Oriented design, the Data-Oriented Design programming model focus on how the data is organized in the memory. As consequence, this programming model may better explore cache spatial locality. In order to evaluate the impact of using the Data-Oriented Design programming model for register clustering, we implemented two software prototypes (a sequential and a parallel implementation) of the K-means clustering algorithm for each programming model. Experimental results showed that the sequential Data-Oriented Design implementation is on average 7.5% faster when compared to the Object-Oriented Design implementation, while its parallel version is 15% faster when compared to the Object-Oriented one.

References

Wing-Kai Chow, Chak-Wa Pui, and Evangeline FY Young. 2016. Legalization algorithm for multiple-row height standard cell design. In Design Automation Conference (DAC), 2016 53nd ACM/EDAC/IEEE. IEEE, 1--6. Google ScholarDigital Library
Federal University of Santa Catarina Embedded Computing Lab. 2017. Ophidian: an Open Source Library for Physical Design Research and Teaching. https://github.com/eclufsc/ophidian. (2017).Google Scholar
Guilherme Flach, Mateus Fogaça, Jucemar Monteiro, Marcelo Johann, and Ricardo Reis. 2017. Rsyn: An Extensible Physical Synthesis Framework. In Proceedings of the 2017 ACM on International Symposium on Physical Design. ACM, 33--40. Google ScholarDigital Library
Tiago Fontana, Renan Netto, Vinicius Livramento, Chrystian Guth, Sheiny Almeida, Laércio Pilla, and José Luis Güntzel. 2017. How Game Engines Can Inspire EDA Tools Development: A use case for an open-source physical design library. In Proceedings of the 2017 ACM on International Symposium on Physical Design. ACM, 25--31. Google ScholarDigital Library
M. Guthaus, G. Wilke, and Reis. 2013. Revisiting automated physical synthesis of high-performance clock networks. TODAES 18, 2 (2013), 31:1--31:27. Google ScholarDigital Library
Tsung-Wei Huang and Martin DF Wong. 2015. Opentimer: A high-performance timing analysis tool. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design. IEEE Press, 895--902. Google ScholarDigital Library
Silicon Integration Initiative. 2017. Open Access. http://www.si2.org/openaccess/. (2017).Google Scholar
Jinwook Jung, Iris Hui-Ru Jiang, Gi-Joon Nam, Victor N Kravets, Laleh Behjat, and Yin-Lang Li. 2016. OpenDesign flow database: the infrastructure for VLSI design and design automation research. In Proceedings of the 35th International Conference on Computer-Aided Design. ACM, 42. Google ScholarDigital Library
Andrew B Kahng, Hyein Lee, and Jiajia Li. 2014. Horizontal benchmark extension for improved assessment of physical CAD research. In Proceedings of the 24th edition of the great lakes symposium on VLSI. ACM, 27--32. Google ScholarDigital Library
Andrew B Kahng, Jens Lienig, Igor L Markov, and Jin Hu. 2011. VLSI physical design: from graph partitioning to timing closure. Springer Science & Business Media. Google ScholarDigital Library
M. Kim, J. Hu, J. Li, and N. Viswanathan. 2015. ICCAD-2015 CAD contest in incremental timing-driven placement and benchmark suite. In ICCAD. 921--926. Google ScholarDigital Library
University of Michigan. 2017. UMICH Physical Design Tools. https://www.src.org/library/publication/p013527/. (2017).Google Scholar
OpenMP. 2017. The OpenMP API. http://openmp.org/. (2017).Google Scholar
David Papa, Charles Alpert, Cliff Sze, Zhuo Li, Natarajan Viswanathan, Gi-Joon Nam, and Igor Markov. 2011. Physical synthesis with clock-network optimization for large systems on chips. IEEE Micro 31, 4 (2011), 51--62. Google ScholarDigital Library
David A Patterson and John L Hennessy. 2013. Computer organization and design: the hardware/software interface. Newnes. Google ScholarDigital Library
Shokri Z Selim and Mohamed A Ismail. 1984. K-means-type algorithms: a generalized convergence theorem and characterization of local optimality. Pattern Analysis and Machine Intelligence, IEEE Transactions on 1 (1984), 81--87. Google ScholarDigital Library
Chao-Hung Wang, Yen-Yi Wu, Jianli Chen, Yao-Wen Chang, Sy-Yen Kuo, Wenxing Zhu, and Genghua Fan. 2017. An effective legalization algorithm for mixed-cell-height standard cells. In Design Automation Conference (ASP-DAC), 2017 22nd Asia and South Pacific. IEEE, 450--455.Google ScholarCross Ref
Gang Wu, Yue Xu, Dean Wu, Manoj Ragupathy, Yu-yen Mo, and Chris Chu. 2016. Flip-flop clustering by weighted K-means algorithm. In Design Automation Conference (DAC), 2016 53nd ACM/EDAC/IEEE. IEEE, 1--6. Google ScholarDigital Library
C Yeh, G Wilke, Hongyu Chen, and others. 2006. Clock distribution architectures: A comparative study. In ISQED. 85--91. Google ScholarDigital Library

Index Terms

Exploiting cache locality to speedup register clustering
1. Hardware
  1. Electronic design automation
    1. Methodologies for EDA
      1. Software tools for EDA
    2. Physical design (EDA)
2. Software and its engineering
  1. Software creation and management
    1. Software development techniques

Recommendations

How Game Engines Can Inspire EDA Tools Development: A use case for an open-source physical design library
ISPD '17: Proceedings of the 2017 ACM on International Symposium on Physical Design

Similarly to game engines, physical design tools must handle huge amounts of data. Although the game industry has been employing modern software development concepts such as data-oriented design, most physical design tools still relies on object-...
Read More
Locality-Aware CTA Clustering for Modern GPUs
ASPLOS '17

Cache is designed to exploit locality; however, the role of on-chip L1 data caches on modern GPUs is often awkward. The locality among global memory requests from different SMs (Streaming Multiprocessors) is predominantly harvested by the commonly-...
Read More
Locality-Aware CTA Clustering for Modern GPUs
Asplos'17

Cache is designed to exploit locality; however, the role of on-chip L1 data caches on modern GPUs is often awkward. The locality among global memory requests from different SMs (Streaming Multiprocessors) is predominantly harvested by the commonly-...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SBCCI '17: Proceedings of the 30th Symposium on Integrated Circuits and Systems Design: Chip on the Sands
August 2017
238 pages
ISBN:9781450351065
DOI:10.1145/3109984
General Chair:
Jarbas A. N. Silveira
UFC - Brazil
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 28 August 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
cache locality
data-oriented design
electronic design automation
physical design
register clustering
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate133of347submissions,38%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 82
  Total Downloads
- Downloads (Last 12 months)6
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Exploiting cache locality to speedup register clustering

SBCCI '17: Proceedings of the 30th Symposium on Integrated Circuits and Systems Design: Chip on the Sands

ABSTRACT

References

Cited By

Index Terms

Recommendations

How Game Engines Can Inspire EDA Tools Development: A use case for an open-source physical design library

Locality-Aware CTA Clustering for Modern GPUs

Locality-Aware CTA Clustering for Modern GPUs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Exploiting cache locality to speedup register clustering

SBCCI '17: Proceedings of the 30th Symposium on Integrated Circuits and Systems Design: Chip on the Sands

ABSTRACT

References

Cited By

Index Terms

Recommendations

How Game Engines Can Inspire EDA Tools Development: A use case for an open-source physical design library

Locality-Aware CTA Clustering for Modern GPUs

Locality-Aware CTA Clustering for Modern GPUs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media