Abstract
Determining the optimal microarchitecture configuration of a processor at the early stages of design is undeniably a challenge. Due to many parameters at the microarchitecture level, finding the proper combination of these parameters to arrive at a balanced design is difficult. Application-specific Design Space Exploration (DSE) is even more difficult, since the property of application needs to be considered during the DSE process. Improving the speed and accuracy of the DSE process remains a particular challenge in microprocessor design.
In this article, we propose a novel processor DSE methodology based on criticality and sensitivity analysis, named Criticality and Sensitivity-based Multi-Objective DSE (CSMO-DSE). In our methodology, a dependence-graph is derived from the profile generated by running a program on an instrumented cycle-accurate microprocessor simulator. Then, the criticality of the processor’s performance events is obtained through critical path analysis. The sensitivity of microarchitecture parameters to various performance events is also analyzed. Then, this information is used to optimize performance, power/area, and energy efficiency of the design. Experiments with SPEC 2006 show that CSMO-DSE methodology is 4.73× faster than the baseline DSE methodology and that the quality of result (QoR) is better than the baseline methodology for all the benchmark programs.
- Manish Arora, Feng Wang, Bob Rychlik, and Dean M. Tullsen. 2012. Efficient system design using the statistical analysis of architectural bottlenecks methodology. In Proceedings of the 2012 International Conference on Embedded Computer Systems (SAMOS’12). IEEE, 217--226.Google Scholar
- Richard Bellman. 1958. On a routing problem. Quart. Appl. Math. 16, 1 (1958), 87--90.Google ScholarCross Ref
- Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K. Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R. Hower, Tushar Krishna, Somayeh Sardashti, et al. 2011. The gem5 simulator. ACM SIGARCH Comput. Arch. News 39, 2 (2011), 1--7.Google ScholarDigital Library
- Ramazan Bitirgen, Engin Ipek, and Jose F. Martinez. 2008. Coordinated management of multiple interacting resources in chip multiprocessors: A machine learning approach. In Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 318--329.Google Scholar
- Richard Carl and James E. Smith. 1998. Modeling superscalar processors via statistical simulation. In Workshop on Performance Analysis and Its Impact on Design.Google Scholar
- Hsi-Chuan Chen, David Hung-Chang Du, and Li-Ren Liu. 1993. Critical path selection for performance optimization. , IEEE Trans. Comput.-Aid. Des. Integr. Circ. Syst. 12, 2 (1993), 185--195.Google ScholarDigital Library
- Tianshi Chen, Qi Guo, Ke Tang, Olivier Temam, Zhiwei Xu, Zhi-Hua Zhou, and Yunji Chen. 2014. Archranker: A ranking approach to design space exploration. In Proceedings of the 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA’14). IEEE, 85--96.Google ScholarCross Ref
- Christophe Dubach, Timothy Jones, and Michael O’Boyle. 2007. Microarchitectural design space exploration using an architecture-centric approach. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 262--271.Google ScholarDigital Library
- Alfred E. Dunlop, Vishwani D. Agrawal, David N. Deutsch, M. F. Jukl, Patrick Kozak, and Manfred Wiesel. 1984. Chip layout optimization using critical path weighting. In Proceedings of the 21st Design Automation Conference. IEEE Press, 133--136.Google ScholarDigital Library
- Lieven Eeckhout, Robert H. Bell, Bastiaan Stougie, Koen De Bosschere, and Lizy K. John. 2004. Control flow modeling in statistical simulation for accurate and efficient processor design studies. In Proceedings of the 31st Annual International Symposium on Computer Architecture 2004. IEEE, 350--361.Google Scholar
- Lieven Eeckhout, Koen De Bosschere, and Henk Neefs. 2000. Performance analysis through synthetic trace generation. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS’00). IEEE, 1--6.Google ScholarCross Ref
- Yoav Etsion, Felipe Cabarcas, Alejandro Rico, Alex Ramirez, Rosa M. Badia, Eduard Ayguade, Jesus Labarta, and Mateo Valero. 2010. Task superscalar: An out-of-order task pipeline. In Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture. IEEE, 89--100.Google ScholarDigital Library
- Stijn Eyerman, Lieven Eeckhout, and Koen De Bosschere. 2006. Efficient design space exploration of high performance embedded out-of-order processors. In Proceedings of the Conference on Design, Automation and Test in Europe. European Design and Automation Association, 351--356.Google ScholarCross Ref
- Chen-Liang Fang and Wen-Ben Jone. 1995. Timing optimization by gate resizing and critical path identification. IEEE Trans. Comput.-Aid. Des. Integr. Circ. Syst. 14, 2 (1995), 201--217.Google ScholarDigital Library
- Brian Fields, Rasatislav Bodík, and Mark D. Hill. 2002. Slack: Maximizing performance under technological constraints. In Proceedings of the 29th Annual International Symposium on Computer Architecture. IEEE, 47--58.Google Scholar
- Brian Fields, Shai Rubin, and Rastislav Bodík. 2001. Focusing processor policies via critical-path prediction. In Proceedings of the 28th Annual International Symposium on Computer Architecture. IEEE, 74--85.Google ScholarDigital Library
- Brian A. Fields, Rastislav Bodík, Mark D. Hill, and Chris J. Newburn. 2003. Using interaction costs for microarchitectural bottleneck analysis. In Proceedings of the 36th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 228.Google Scholar
- Davy Genbrugge and Lieven Eeckhout. 2008. Memory data flow modeling in statistical simulation for the efficient exploration of microprocessor design spaces. IEEE Trans. Comput. 57, 1 (2008), 41--54.Google ScholarDigital Library
- Fred Glover. 1989. Tabu search—part I [J]. ORSA Journal on Computing 1, 3 (1989), 190–206.Google ScholarCross Ref
- Qi Guo, Tianshi Chen, Yunji Chen, Zhi-Hua Zhou, Weiwu Hu, and Zhiwei Xu. 2011. Effective and efficient microprocessor design space exploration using unlabeled design configurations. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI’11), Vol. 22. Citeseer, 1671.Google Scholar
- Greg Hamerly, Erez Perelman, Jeremy Lau, and Brad Calder. 2005. Simpoint 3.0: Faster and more flexible program phase analysis. J. Instruct. Level Parallel. 7, 4 (2005), 1--28.Google Scholar
- John L. Henning. 2006. SPEC CPU2006 benchmark descriptions. ACM SIGARCH Comput. Arch. News 34, 4 (2006), 1--17.Google ScholarDigital Library
- Robert Hitchcock, Gordon L. Smith, David D. Cheng, et al. 1982. Timing analysis of computer hardware. IBM J. Res. Dev. 26, 1 (1982), 100--105.Google ScholarDigital Library
- Engin Ïpek, Sally A. McKee, Rich Caruana, Bronis R. de Supinski, and Martin Schulz. 2006. Efficiently Exploring Architectural Design Spaces via Predictive Modeling. Vol. 40. ACM.Google Scholar
- Rik Jongerius, Andreea Anghel, Gero Dittmann, Giovanni Mariani, Erik Vermij, and Henk Corporaal. 2018. Analytic multi-core processor model for fast design-space exploration. IEEE Trans. Comput. 67, 6 (2018), 755--770.Google ScholarCross Ref
- Sukhun Kang and Rakesh Kumar. 2008. Magellan: A search and machine learning-based framework for fast multi-core design space exploration and optimization. In Proceedings of the Conference on Design, Automation and Test in Europe. ACM, 1432--1437.Google ScholarDigital Library
- Ireneusz Karkowski and Henk Corporaal. 1998. Design space exploration algorithm for heterogeneous multi-processor embedded system design. In Proceedings of the 35th Annual Design Automation Conference. ACM, 82--87.Google Scholar
- Michel A. Kinsy, Michael Pellauer, and Srinivas Devadas. 2013. Heracles: A tool for fast RTL-based design space exploration of multicore processors. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays. ACM, 125--134.Google ScholarDigital Library
- B. Lee and David Brooks. 2006a. Statistically rigorous regression modeling for the microprocessor design space. In Proceedings of the International Symposium on Computer Architecture Workshop on Modeling, Benchmarking, and Simulation (ISCA-33).Google Scholar
- Benjamin C. Lee and David M. Brooks. 2006b. Accurate and efficient regression modeling for microarchitectural performance and power prediction. In ACM SIGPLAN Notices, Vol. 41. ACM, 185--194.Google Scholar
- Jaewon Lee, Hanhwi Jang, and Jangwoo Kim. 2014. RpStacks: Fast and accurate processor design space exploration using representative stall-event stacks. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 255--267.Google ScholarDigital Library
- Sheng Li, Jung Ho Ahn, Richard D. Strong, Jay B. Brockman, Dean M. Tullsen, and Norman P. Jouppi. 2009. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture 2009 (MICRO-42). IEEE, 469--480.Google Scholar
- Li-Ren Liu, David H. C. Du, and Hsi-Chuan Chen. 1994. An efficient parallel critical path algorithm. IEEE Trans. Comput.-Aid. Des. Integr. Circ. Syst. 13, 7 (1994), 909--919.Google ScholarDigital Library
- Giovanni Mariani, Aleksandar Brankovic, Gianluca Palermo, Jovana Jovic, Vittorio Zaccaria, and Cristina Silvano. 2010. A correlation-based design space exploration methodology for multi-processor systems-on-chip. In Proceedings of the 47th Design Automation Conference. ACM, 120--125.Google ScholarDigital Library
- INC Minitab. 2000. MINITAB statistical software. Minitab Release 13 (2000).Google Scholar
- Vipul Kumar Mishra and Anirban Sengupta. 2014. MO-PSE: Adaptive multi-objective particle swarm optimization based design space exploration in architectural synthesis for application specific processor design [J]. Adv. Eng. Softw. 67 (2014), 111--124.Google ScholarDigital Library
- Sandeep Navada, Niket K. Choudhary, and Eric Rotenberg. 2010. Criticality-driven superscalar design space exploration. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques. ACM, 261--272.Google ScholarDigital Library
- Sébastien Nussbaum and James E. Smith. 2001. Modeling superscalar processors via statistical simulation. In Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques 2001. IEEE, 15--24.Google ScholarCross Ref
- Berkin Ozisikyilmaz, Gokhan Memik, and Alok Choudhary. 2008. Efficient system design space exploration using machine learning techniques. In Proceedings of the 45th Annual Design Automation Conference. ACM, 966--969.Google ScholarDigital Library
- Gianluca Palermo, Cristina Silvano, and Vittorio Zaccaria. 2009. ReSPIR: A response surface-based Pareto iterative refinement for application-specific design space exploration. IEEE Trans. Comput.-Aid. Des. Integr. Circ. Syst. 28, 12 (2009), 1816--1829.Google ScholarDigital Library
- Aashish Phansalkar, Ajay Joshi, and Lizy K. John. 2007. Analysis of redundancy and application balance in the SPEC CPU2006 benchmark suite. In ACM SIGARCH Computer Architecture News, Vol. 35. ACM, 412--423.Google Scholar
- Robin L. Plackett and J. Peter Burman. 1946. The design of optimum multifactorial experiments [J]. Biometrika 33, 4 (1946), 305--325.Google ScholarCross Ref
- Karthikeyan Sankaralingam, Ramadass Nagarajan, Robert McDonald, Rajagopalan Desikan, Saurabh Drolia, M. S. Govindan, P. Gratzf, Divya Gulati, Heather Hanson, Changkyu Kim, et al. 2006. Distributed microarchitectural protocols in the TRIPS prototype processor. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture 2006 (MICRO-39). IEEE, 480--491.Google ScholarDigital Library
- John S. Seng, Eric S. Tune, and Dean M. Tullsen. 2001. Reducing power with dynamic critical path information. In Proceedings of the 34th Annual ACM/IEEE International Symposium on Microarchitecture. IEEE Computer Society, 114--123.Google Scholar
- Cristina Silvano, William Fornaciari, Gianluca Palermo, Vittorio Zaccaria, Fabrizio Castro, Marcos Martinez, Sara Bocchio, Roberto Zafalon, Prabhat Avasare, Geert Vanmeerbeeck, et al. 2011. Multicube: Multi-objective design space exploration of multi-core architectures. In Proceedings of the VLSI 2010 Annual Symposium. Springer, 47--63.Google ScholarCross Ref
- Ashish Srivastava, Dennis Sylvester, and David Blaauw. 2006. Statistical Analysis and Optimization for VLSI: Timing and Power. Springer Science 8 Business Media.Google Scholar
- Eric Tune, Dongning Liang, Dean M. Tullsen, and Brad Calder. 2001. Dynamic prediction of critical path instructions. In Proceedings of the 7th International Symposium on High-Performance Computer Architecture 2001 (HPCA’01). IEEE, 185--195.Google ScholarCross Ref
- Eric S. Tune, Dean M. Tullsen, and Brad Calder. 2002. Quantifying instruction criticality. In Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques. IEEE, 104--113.Google ScholarDigital Library
- Lei Wang, Yu Deng, Rui Gong, Wei Shi, Zhenyu Zhao, and Qiang Dou. 2018. A parallel algorithm for instruction dependence graph analysis based on multithreading. In Proceedings of the 2018 IEEE International Conference on Parallel and Distributed Processing with Applications, Ubiquitous Computing and Communications, Big Data and Cloud Computing, Social Computing and Networking, and Sustainable Computing and Communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom’18). IEEE, 716--721.Google Scholar
- Lei Wang, YuXing Tang, Yu Deng, Fangyan Qin, Qiang Dou, Guangda Zhang, and Feipeng Zhang. 2015. A Scalable and fast microprocessor design space exploration methodology. In Proceedings of the 2015 IEEE 9th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC’15). IEEE, 33--40.Google ScholarDigital Library
- Roland E. Wunderlich, Thomas F. Wenisch, Babak Falsafi, and James C. Hoe. 2003. SMARTS: Accelerating microarchitecture simulation via rigorous statistical sampling. In Proceedings of the 30th Annual International Symposium on Computer Architecture. IEEE, 84--95.Google Scholar
- Fahimeh Yazdanpanah, Daniel Jimenez-Gonzalez, Carlos Alvarez-Martinez, Yoav Etsion, and Rosa M. Badia. 2013. Analysis of the task superscalar architecture hardware design [J]. Proc. Comput. Sci. 18 (2013), 339--348.Google ScholarCross Ref
- Joshua J. Yi, David J. Lilja, and Douglas M. Hawkins. 2003. A statistically rigorous approach for improving simulation methodology. In Proceedings of the 9th International Symposium on High-Performance Computer Architecture (HPCA-9’03). IEEE, 281--291.Google Scholar
- Yuhao Zhu and Vijay Janapa Reddi. 2014. Webcore: Architectural support for mobileweb browsing. In Proceeding of the 41st Annual International Symposium on Computer Architecuture. IEEE Press, 541--552.Google ScholarDigital Library
Index Terms
- CSMO-DSE: Fast and Precise Application-driven DSE Guided by Criticality and Sensitivity Analysis
Recommendations
Energy-performance tradeoffs in processor architecture and circuit design: a marginal cost analysis
ISCA '10Power consumption has become a major constraint in the design of processors today. To optimize a processor for energy-efficiency requires an examination of energy-performance trade-offs in all aspects of the processor design space, including both ...
BOOM-Explorer: RISC-V BOOM Microarchitecture Design Space Exploration
Microarchitecture parameters tuning is critical in the microprocessor design cycle. It is a non-trivial design space exploration (DSE) problem due to the large solution space, cycle-accurate simulators’ modeling inaccuracy, and high simulation runtime for ...
NoSQ: Store-Load Communication without a Store Queue
The NoSQ microarchitecture performs store-load communication without a store queue and without executing stores in the out-of-order engine. It uses speculative memory bypassing for all in-flight store-load communication, enabled by a 99.8 percent ...
Comments