skip to main content
research-article

CSMO-DSE: Fast and Precise Application-driven DSE Guided by Criticality and Sensitivity Analysis

Authors Info & Claims
Published:30 January 2020Publication History
Skip Abstract Section

Abstract

Determining the optimal microarchitecture configuration of a processor at the early stages of design is undeniably a challenge. Due to many parameters at the microarchitecture level, finding the proper combination of these parameters to arrive at a balanced design is difficult. Application-specific Design Space Exploration (DSE) is even more difficult, since the property of application needs to be considered during the DSE process. Improving the speed and accuracy of the DSE process remains a particular challenge in microprocessor design.

In this article, we propose a novel processor DSE methodology based on criticality and sensitivity analysis, named Criticality and Sensitivity-based Multi-Objective DSE (CSMO-DSE). In our methodology, a dependence-graph is derived from the profile generated by running a program on an instrumented cycle-accurate microprocessor simulator. Then, the criticality of the processor’s performance events is obtained through critical path analysis. The sensitivity of microarchitecture parameters to various performance events is also analyzed. Then, this information is used to optimize performance, power/area, and energy efficiency of the design. Experiments with SPEC 2006 show that CSMO-DSE methodology is 4.73× faster than the baseline DSE methodology and that the quality of result (QoR) is better than the baseline methodology for all the benchmark programs.

References

  1. Manish Arora, Feng Wang, Bob Rychlik, and Dean M. Tullsen. 2012. Efficient system design using the statistical analysis of architectural bottlenecks methodology. In Proceedings of the 2012 International Conference on Embedded Computer Systems (SAMOS’12). IEEE, 217--226.Google ScholarGoogle Scholar
  2. Richard Bellman. 1958. On a routing problem. Quart. Appl. Math. 16, 1 (1958), 87--90.Google ScholarGoogle ScholarCross RefCross Ref
  3. Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K. Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R. Hower, Tushar Krishna, Somayeh Sardashti, et al. 2011. The gem5 simulator. ACM SIGARCH Comput. Arch. News 39, 2 (2011), 1--7.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Ramazan Bitirgen, Engin Ipek, and Jose F. Martinez. 2008. Coordinated management of multiple interacting resources in chip multiprocessors: A machine learning approach. In Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 318--329.Google ScholarGoogle Scholar
  5. Richard Carl and James E. Smith. 1998. Modeling superscalar processors via statistical simulation. In Workshop on Performance Analysis and Its Impact on Design.Google ScholarGoogle Scholar
  6. Hsi-Chuan Chen, David Hung-Chang Du, and Li-Ren Liu. 1993. Critical path selection for performance optimization. , IEEE Trans. Comput.-Aid. Des. Integr. Circ. Syst. 12, 2 (1993), 185--195.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Tianshi Chen, Qi Guo, Ke Tang, Olivier Temam, Zhiwei Xu, Zhi-Hua Zhou, and Yunji Chen. 2014. Archranker: A ranking approach to design space exploration. In Proceedings of the 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA’14). IEEE, 85--96.Google ScholarGoogle ScholarCross RefCross Ref
  8. Christophe Dubach, Timothy Jones, and Michael O’Boyle. 2007. Microarchitectural design space exploration using an architecture-centric approach. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 262--271.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Alfred E. Dunlop, Vishwani D. Agrawal, David N. Deutsch, M. F. Jukl, Patrick Kozak, and Manfred Wiesel. 1984. Chip layout optimization using critical path weighting. In Proceedings of the 21st Design Automation Conference. IEEE Press, 133--136.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Lieven Eeckhout, Robert H. Bell, Bastiaan Stougie, Koen De Bosschere, and Lizy K. John. 2004. Control flow modeling in statistical simulation for accurate and efficient processor design studies. In Proceedings of the 31st Annual International Symposium on Computer Architecture 2004. IEEE, 350--361.Google ScholarGoogle Scholar
  11. Lieven Eeckhout, Koen De Bosschere, and Henk Neefs. 2000. Performance analysis through synthetic trace generation. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS’00). IEEE, 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  12. Yoav Etsion, Felipe Cabarcas, Alejandro Rico, Alex Ramirez, Rosa M. Badia, Eduard Ayguade, Jesus Labarta, and Mateo Valero. 2010. Task superscalar: An out-of-order task pipeline. In Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture. IEEE, 89--100.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Stijn Eyerman, Lieven Eeckhout, and Koen De Bosschere. 2006. Efficient design space exploration of high performance embedded out-of-order processors. In Proceedings of the Conference on Design, Automation and Test in Europe. European Design and Automation Association, 351--356.Google ScholarGoogle ScholarCross RefCross Ref
  14. Chen-Liang Fang and Wen-Ben Jone. 1995. Timing optimization by gate resizing and critical path identification. IEEE Trans. Comput.-Aid. Des. Integr. Circ. Syst. 14, 2 (1995), 201--217.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Brian Fields, Rasatislav Bodík, and Mark D. Hill. 2002. Slack: Maximizing performance under technological constraints. In Proceedings of the 29th Annual International Symposium on Computer Architecture. IEEE, 47--58.Google ScholarGoogle Scholar
  16. Brian Fields, Shai Rubin, and Rastislav Bodík. 2001. Focusing processor policies via critical-path prediction. In Proceedings of the 28th Annual International Symposium on Computer Architecture. IEEE, 74--85.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Brian A. Fields, Rastislav Bodík, Mark D. Hill, and Chris J. Newburn. 2003. Using interaction costs for microarchitectural bottleneck analysis. In Proceedings of the 36th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 228.Google ScholarGoogle Scholar
  18. Davy Genbrugge and Lieven Eeckhout. 2008. Memory data flow modeling in statistical simulation for the efficient exploration of microprocessor design spaces. IEEE Trans. Comput. 57, 1 (2008), 41--54.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Fred Glover. 1989. Tabu search—part I [J]. ORSA Journal on Computing 1, 3 (1989), 190–206.Google ScholarGoogle ScholarCross RefCross Ref
  20. Qi Guo, Tianshi Chen, Yunji Chen, Zhi-Hua Zhou, Weiwu Hu, and Zhiwei Xu. 2011. Effective and efficient microprocessor design space exploration using unlabeled design configurations. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI’11), Vol. 22. Citeseer, 1671.Google ScholarGoogle Scholar
  21. Greg Hamerly, Erez Perelman, Jeremy Lau, and Brad Calder. 2005. Simpoint 3.0: Faster and more flexible program phase analysis. J. Instruct. Level Parallel. 7, 4 (2005), 1--28.Google ScholarGoogle Scholar
  22. John L. Henning. 2006. SPEC CPU2006 benchmark descriptions. ACM SIGARCH Comput. Arch. News 34, 4 (2006), 1--17.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Robert Hitchcock, Gordon L. Smith, David D. Cheng, et al. 1982. Timing analysis of computer hardware. IBM J. Res. Dev. 26, 1 (1982), 100--105.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Engin Ïpek, Sally A. McKee, Rich Caruana, Bronis R. de Supinski, and Martin Schulz. 2006. Efficiently Exploring Architectural Design Spaces via Predictive Modeling. Vol. 40. ACM.Google ScholarGoogle Scholar
  25. Rik Jongerius, Andreea Anghel, Gero Dittmann, Giovanni Mariani, Erik Vermij, and Henk Corporaal. 2018. Analytic multi-core processor model for fast design-space exploration. IEEE Trans. Comput. 67, 6 (2018), 755--770.Google ScholarGoogle ScholarCross RefCross Ref
  26. Sukhun Kang and Rakesh Kumar. 2008. Magellan: A search and machine learning-based framework for fast multi-core design space exploration and optimization. In Proceedings of the Conference on Design, Automation and Test in Europe. ACM, 1432--1437.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Ireneusz Karkowski and Henk Corporaal. 1998. Design space exploration algorithm for heterogeneous multi-processor embedded system design. In Proceedings of the 35th Annual Design Automation Conference. ACM, 82--87.Google ScholarGoogle Scholar
  28. Michel A. Kinsy, Michael Pellauer, and Srinivas Devadas. 2013. Heracles: A tool for fast RTL-based design space exploration of multicore processors. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays. ACM, 125--134.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. B. Lee and David Brooks. 2006a. Statistically rigorous regression modeling for the microprocessor design space. In Proceedings of the International Symposium on Computer Architecture Workshop on Modeling, Benchmarking, and Simulation (ISCA-33).Google ScholarGoogle Scholar
  30. Benjamin C. Lee and David M. Brooks. 2006b. Accurate and efficient regression modeling for microarchitectural performance and power prediction. In ACM SIGPLAN Notices, Vol. 41. ACM, 185--194.Google ScholarGoogle Scholar
  31. Jaewon Lee, Hanhwi Jang, and Jangwoo Kim. 2014. RpStacks: Fast and accurate processor design space exploration using representative stall-event stacks. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 255--267.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Sheng Li, Jung Ho Ahn, Richard D. Strong, Jay B. Brockman, Dean M. Tullsen, and Norman P. Jouppi. 2009. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture 2009 (MICRO-42). IEEE, 469--480.Google ScholarGoogle Scholar
  33. Li-Ren Liu, David H. C. Du, and Hsi-Chuan Chen. 1994. An efficient parallel critical path algorithm. IEEE Trans. Comput.-Aid. Des. Integr. Circ. Syst. 13, 7 (1994), 909--919.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Giovanni Mariani, Aleksandar Brankovic, Gianluca Palermo, Jovana Jovic, Vittorio Zaccaria, and Cristina Silvano. 2010. A correlation-based design space exploration methodology for multi-processor systems-on-chip. In Proceedings of the 47th Design Automation Conference. ACM, 120--125.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. INC Minitab. 2000. MINITAB statistical software. Minitab Release 13 (2000).Google ScholarGoogle Scholar
  36. Vipul Kumar Mishra and Anirban Sengupta. 2014. MO-PSE: Adaptive multi-objective particle swarm optimization based design space exploration in architectural synthesis for application specific processor design [J]. Adv. Eng. Softw. 67 (2014), 111--124.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Sandeep Navada, Niket K. Choudhary, and Eric Rotenberg. 2010. Criticality-driven superscalar design space exploration. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques. ACM, 261--272.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Sébastien Nussbaum and James E. Smith. 2001. Modeling superscalar processors via statistical simulation. In Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques 2001. IEEE, 15--24.Google ScholarGoogle ScholarCross RefCross Ref
  39. Berkin Ozisikyilmaz, Gokhan Memik, and Alok Choudhary. 2008. Efficient system design space exploration using machine learning techniques. In Proceedings of the 45th Annual Design Automation Conference. ACM, 966--969.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Gianluca Palermo, Cristina Silvano, and Vittorio Zaccaria. 2009. ReSPIR: A response surface-based Pareto iterative refinement for application-specific design space exploration. IEEE Trans. Comput.-Aid. Des. Integr. Circ. Syst. 28, 12 (2009), 1816--1829.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Aashish Phansalkar, Ajay Joshi, and Lizy K. John. 2007. Analysis of redundancy and application balance in the SPEC CPU2006 benchmark suite. In ACM SIGARCH Computer Architecture News, Vol. 35. ACM, 412--423.Google ScholarGoogle Scholar
  42. Robin L. Plackett and J. Peter Burman. 1946. The design of optimum multifactorial experiments [J]. Biometrika 33, 4 (1946), 305--325.Google ScholarGoogle ScholarCross RefCross Ref
  43. Karthikeyan Sankaralingam, Ramadass Nagarajan, Robert McDonald, Rajagopalan Desikan, Saurabh Drolia, M. S. Govindan, P. Gratzf, Divya Gulati, Heather Hanson, Changkyu Kim, et al. 2006. Distributed microarchitectural protocols in the TRIPS prototype processor. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture 2006 (MICRO-39). IEEE, 480--491.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. John S. Seng, Eric S. Tune, and Dean M. Tullsen. 2001. Reducing power with dynamic critical path information. In Proceedings of the 34th Annual ACM/IEEE International Symposium on Microarchitecture. IEEE Computer Society, 114--123.Google ScholarGoogle Scholar
  45. Cristina Silvano, William Fornaciari, Gianluca Palermo, Vittorio Zaccaria, Fabrizio Castro, Marcos Martinez, Sara Bocchio, Roberto Zafalon, Prabhat Avasare, Geert Vanmeerbeeck, et al. 2011. Multicube: Multi-objective design space exploration of multi-core architectures. In Proceedings of the VLSI 2010 Annual Symposium. Springer, 47--63.Google ScholarGoogle ScholarCross RefCross Ref
  46. Ashish Srivastava, Dennis Sylvester, and David Blaauw. 2006. Statistical Analysis and Optimization for VLSI: Timing and Power. Springer Science 8 Business Media.Google ScholarGoogle Scholar
  47. Eric Tune, Dongning Liang, Dean M. Tullsen, and Brad Calder. 2001. Dynamic prediction of critical path instructions. In Proceedings of the 7th International Symposium on High-Performance Computer Architecture 2001 (HPCA’01). IEEE, 185--195.Google ScholarGoogle ScholarCross RefCross Ref
  48. Eric S. Tune, Dean M. Tullsen, and Brad Calder. 2002. Quantifying instruction criticality. In Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques. IEEE, 104--113.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Lei Wang, Yu Deng, Rui Gong, Wei Shi, Zhenyu Zhao, and Qiang Dou. 2018. A parallel algorithm for instruction dependence graph analysis based on multithreading. In Proceedings of the 2018 IEEE International Conference on Parallel and Distributed Processing with Applications, Ubiquitous Computing and Communications, Big Data and Cloud Computing, Social Computing and Networking, and Sustainable Computing and Communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom’18). IEEE, 716--721.Google ScholarGoogle Scholar
  50. Lei Wang, YuXing Tang, Yu Deng, Fangyan Qin, Qiang Dou, Guangda Zhang, and Feipeng Zhang. 2015. A Scalable and fast microprocessor design space exploration methodology. In Proceedings of the 2015 IEEE 9th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC’15). IEEE, 33--40.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Roland E. Wunderlich, Thomas F. Wenisch, Babak Falsafi, and James C. Hoe. 2003. SMARTS: Accelerating microarchitecture simulation via rigorous statistical sampling. In Proceedings of the 30th Annual International Symposium on Computer Architecture. IEEE, 84--95.Google ScholarGoogle Scholar
  52. Fahimeh Yazdanpanah, Daniel Jimenez-Gonzalez, Carlos Alvarez-Martinez, Yoav Etsion, and Rosa M. Badia. 2013. Analysis of the task superscalar architecture hardware design [J]. Proc. Comput. Sci. 18 (2013), 339--348.Google ScholarGoogle ScholarCross RefCross Ref
  53. Joshua J. Yi, David J. Lilja, and Douglas M. Hawkins. 2003. A statistically rigorous approach for improving simulation methodology. In Proceedings of the 9th International Symposium on High-Performance Computer Architecture (HPCA-9’03). IEEE, 281--291.Google ScholarGoogle Scholar
  54. Yuhao Zhu and Vijay Janapa Reddi. 2014. Webcore: Architectural support for mobileweb browsing. In Proceeding of the 41st Annual International Symposium on Computer Architecuture. IEEE Press, 541--552.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. CSMO-DSE: Fast and Precise Application-driven DSE Guided by Criticality and Sensitivity Analysis

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Journal on Emerging Technologies in Computing Systems
      ACM Journal on Emerging Technologies in Computing Systems  Volume 16, Issue 2
      April 2020
      261 pages
      ISSN:1550-4832
      EISSN:1550-4840
      DOI:10.1145/3375712
      • Editor:
      • Zhaojun Bai
      Issue’s Table of Contents

      Copyright © 2020 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 30 January 2020
      • Accepted: 1 October 2019
      • Revised: 1 May 2019
      • Received: 1 August 2018
      Published in jetc Volume 16, Issue 2

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format