ABSTRACT
Physical design tools must handle huge amounts of data in order to solve problems for circuits with millions of cells. Traditionally, Electronic Design Automation tools are implemented using Object-Oriented Design. However, using this paradigm may lead to overly complex objects that result in waste of cache memory space. This memory wasting harms cache locality exploration and, consequently, degrades software runtime. This work proposes applying Data-Oriented Design on the register clustering problem. Differently from the traditional Object-Oriented design, the Data-Oriented Design programming model focus on how the data is organized in the memory. As consequence, this programming model may better explore cache spatial locality. In order to evaluate the impact of using the Data-Oriented Design programming model for register clustering, we implemented two software prototypes (a sequential and a parallel implementation) of the K-means clustering algorithm for each programming model. Experimental results showed that the sequential Data-Oriented Design implementation is on average 7.5% faster when compared to the Object-Oriented Design implementation, while its parallel version is 15% faster when compared to the Object-Oriented one.
- Wing-Kai Chow, Chak-Wa Pui, and Evangeline FY Young. 2016. Legalization algorithm for multiple-row height standard cell design. In Design Automation Conference (DAC), 2016 53nd ACM/EDAC/IEEE. IEEE, 1--6. Google ScholarDigital Library
- Federal University of Santa Catarina Embedded Computing Lab. 2017. Ophidian: an Open Source Library for Physical Design Research and Teaching. https://github.com/eclufsc/ophidian. (2017).Google Scholar
- Guilherme Flach, Mateus Fogaça, Jucemar Monteiro, Marcelo Johann, and Ricardo Reis. 2017. Rsyn: An Extensible Physical Synthesis Framework. In Proceedings of the 2017 ACM on International Symposium on Physical Design. ACM, 33--40. Google ScholarDigital Library
- Tiago Fontana, Renan Netto, Vinicius Livramento, Chrystian Guth, Sheiny Almeida, Laércio Pilla, and José Luis Güntzel. 2017. How Game Engines Can Inspire EDA Tools Development: A use case for an open-source physical design library. In Proceedings of the 2017 ACM on International Symposium on Physical Design. ACM, 25--31. Google ScholarDigital Library
- M. Guthaus, G. Wilke, and Reis. 2013. Revisiting automated physical synthesis of high-performance clock networks. TODAES 18, 2 (2013), 31:1--31:27. Google ScholarDigital Library
- Tsung-Wei Huang and Martin DF Wong. 2015. Opentimer: A high-performance timing analysis tool. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design. IEEE Press, 895--902. Google ScholarDigital Library
- Silicon Integration Initiative. 2017. Open Access. http://www.si2.org/openaccess/. (2017).Google Scholar
- Jinwook Jung, Iris Hui-Ru Jiang, Gi-Joon Nam, Victor N Kravets, Laleh Behjat, and Yin-Lang Li. 2016. OpenDesign flow database: the infrastructure for VLSI design and design automation research. In Proceedings of the 35th International Conference on Computer-Aided Design. ACM, 42. Google ScholarDigital Library
- Andrew B Kahng, Hyein Lee, and Jiajia Li. 2014. Horizontal benchmark extension for improved assessment of physical CAD research. In Proceedings of the 24th edition of the great lakes symposium on VLSI. ACM, 27--32. Google ScholarDigital Library
- Andrew B Kahng, Jens Lienig, Igor L Markov, and Jin Hu. 2011. VLSI physical design: from graph partitioning to timing closure. Springer Science & Business Media. Google ScholarDigital Library
- M. Kim, J. Hu, J. Li, and N. Viswanathan. 2015. ICCAD-2015 CAD contest in incremental timing-driven placement and benchmark suite. In ICCAD. 921--926. Google ScholarDigital Library
- University of Michigan. 2017. UMICH Physical Design Tools. https://www.src.org/library/publication/p013527/. (2017).Google Scholar
- OpenMP. 2017. The OpenMP API. http://openmp.org/. (2017).Google Scholar
- David Papa, Charles Alpert, Cliff Sze, Zhuo Li, Natarajan Viswanathan, Gi-Joon Nam, and Igor Markov. 2011. Physical synthesis with clock-network optimization for large systems on chips. IEEE Micro 31, 4 (2011), 51--62. Google ScholarDigital Library
- David A Patterson and John L Hennessy. 2013. Computer organization and design: the hardware/software interface. Newnes. Google ScholarDigital Library
- Shokri Z Selim and Mohamed A Ismail. 1984. K-means-type algorithms: a generalized convergence theorem and characterization of local optimality. Pattern Analysis and Machine Intelligence, IEEE Transactions on 1 (1984), 81--87. Google ScholarDigital Library
- Chao-Hung Wang, Yen-Yi Wu, Jianli Chen, Yao-Wen Chang, Sy-Yen Kuo, Wenxing Zhu, and Genghua Fan. 2017. An effective legalization algorithm for mixed-cell-height standard cells. In Design Automation Conference (ASP-DAC), 2017 22nd Asia and South Pacific. IEEE, 450--455.Google ScholarCross Ref
- Gang Wu, Yue Xu, Dean Wu, Manoj Ragupathy, Yu-yen Mo, and Chris Chu. 2016. Flip-flop clustering by weighted K-means algorithm. In Design Automation Conference (DAC), 2016 53nd ACM/EDAC/IEEE. IEEE, 1--6. Google ScholarDigital Library
- C Yeh, G Wilke, Hongyu Chen, and others. 2006. Clock distribution architectures: A comparative study. In ISQED. 85--91. Google ScholarDigital Library
Index Terms
- Exploiting cache locality to speedup register clustering
Recommendations
How Game Engines Can Inspire EDA Tools Development: A use case for an open-source physical design library
ISPD '17: Proceedings of the 2017 ACM on International Symposium on Physical DesignSimilarly to game engines, physical design tools must handle huge amounts of data. Although the game industry has been employing modern software development concepts such as data-oriented design, most physical design tools still relies on object-...
Locality-Aware CTA Clustering for Modern GPUs
ASPLOS '17Cache is designed to exploit locality; however, the role of on-chip L1 data caches on modern GPUs is often awkward. The locality among global memory requests from different SMs (Streaming Multiprocessors) is predominantly harvested by the commonly-...
Locality-Aware CTA Clustering for Modern GPUs
Asplos'17Cache is designed to exploit locality; however, the role of on-chip L1 data caches on modern GPUs is often awkward. The locality among global memory requests from different SMs (Streaming Multiprocessors) is predominantly harvested by the commonly-...
Comments