skip to main content
10.1145/3352460.3358293acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
research-article

SWQUE: A Mode Switching Issue Queue with Priority-Correcting Circular Queue

Published:12 October 2019Publication History

ABSTRACT

The improvement of single-thread performance is much needed. Among the many structures that comprise a processor, the issue queue (IQ) is one of the most important structures that influences high single-thread performance. Correctly assigning the issue priority and providing high capacity efficiency are key features, but no conventional IQ organizations do not sufficiently have these.

In this paper, we propose an IQ called the switching issue queue (SWQUE), which dynamically configures the IQ as a modified circular queue (CIRC-PC) or random queue with an age matrix (AGE) by responding to the degree of capacity demand. CIRC-PC corrects the issue priority when wrap-around occurs by exploiting the finding that instructions that are wrapped around are latency-tolerant. CIRC-PC is used for phases in which capacity efficiency is less important and the correct priority is more important; and AGE is used for phases in which capacity efficiency is more important. Our evaluation results using SPEC2017 benchmark programs show that SWQUE achieved higher performance by averages of 9.7% and 2.9% (up to 24.4% or 10.6%) for integer and floating-point programs, respectively, compared with AGE, which is widely used in current processors.

References

  1. http://www.simplescalar.com/.Google ScholarGoogle Scholar
  2. http://www.mosis.com/.Google ScholarGoogle Scholar
  3. http://ptm.asu.edu/.Google ScholarGoogle Scholar
  4. J. Abella, R. Canal, and A. Gonzalez. 2003. Power- and Complexity-Aware Issue Queue Designs. IEEE Micro 23, Issue 5, 5 (September-October 2003).Google ScholarGoogle Scholar
  5. H. Ando. 2018. Performance Improvement by Prioritizing the Issue of the Instructions in Unconfident Branch Slices. In Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture. 82--94.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. E. Brekelbaum, J. Rupley, C. Wilkerson, and B. Black. 2002. Hierarchical Scheduling Windows. In Proceedings of the 35th Annual IEEE/ACM International Symposium on Microarchitecture. 27--36.Google ScholarGoogle Scholar
  7. M. D. Brown, J. Stark, and Y. N. Patt. 2001. Select-Free Instruction Scheduling Logic. In Proceedings of the 34th Annual IEEE/ACM International Symposium on Microarchitecture. 204--213.Google ScholarGoogle Scholar
  8. M. Butler and Y. Patt. 1992. An Investigation of the Performance of Various Dynamic Scheduling Techniques. In Proceedings of the 25th Annual IEEE/ACM International Symposium on Microarchitecture. 1--9.Google ScholarGoogle Scholar
  9. J. A. Farrell and T. C. Fischer. 1998. Issue Logic for a 600-MHz Out-of-Order Execution Microprocessor. Journal of Solid-State Circuits 33, 5 (May 1998), 707--712.Google ScholarGoogle ScholarCross RefCross Ref
  10. B. Fields, S. Rubin, and R. Bodík. 2001. Focusing Processor Policies via Critical-Path Prediction. In Proceedings of the 28th Annual International Symposium on Computer Architecture. 74--85.Google ScholarGoogle Scholar
  11. M. Golden, S. Arekapudi, and J. Vinh. 2011. 40-Entry Unified Out-of-Order Scheduler and Integer Execution Unit for the AMD Bulldozer x86-64 Core. In 2011 IEEE International Solid-State Circuits Conference, Digest of Technical Papers. 80--82.Google ScholarGoogle Scholar
  12. M. Goshima. 2004. Research on High-Speed Instruction Scheduling Logic for Out-of-Order ILP Processor. Ph.D. Dissertation. Kyoto University.Google ScholarGoogle Scholar
  13. M. Goshima, K. Nishino, T. Kitamura, Y. Nakashima, S. Tomita, and S. Mori. 2001. A High-Speed Dynamic Instruction Scheduling Scheme for Superscalar Processors. In Proceedings of the 34th Annual IEEE/ACM International Symposium on Microarchitecture. 225--236.Google ScholarGoogle Scholar
  14. G. Goto, A. Inoue, R. Ohe, S. Kashiwakura, S. Mitarai, T. Tsuru, and T. Izawa. 1997. A 4.1-ns Compact 54 × 54-b Multiplier Utilizing Sign-Select Booth Encoders. IEEE Journal of Solid-State Circuits 32, 11 (December 1997), 1676--1682.Google ScholarGoogle ScholarCross RefCross Ref
  15. D. S. Henry, B. C. Kuszmaul, G. H. Loh, and R. Sami. 2000. Circuits for Wide-Window Superscalar Processors. In Proceedings of the 27th Annual International Symposium on Computer Architecture. 236--247.Google ScholarGoogle Scholar
  16. International Technology Roadmap for Semiconductors (http://www.itrs2.net/).Google ScholarGoogle Scholar
  17. Y. Kora, K. Yamaguchi, and H. Ando. 2013. MLP-Aware Dynamic Instruction Window Resizing for Adaptively Exploiting Both ILP and MLP. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture. 37--48.Google ScholarGoogle Scholar
  18. R. Kumar and G. Hinton. 2009. A Family of 45nm IA Processors. In 2009 IEEE International Solid-State Circuits Conference, Digest of Technical Papers. 58--59.Google ScholarGoogle Scholar
  19. S. Li, J. H. Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen, and N. P. Jouppi. 2009. McPAT: An Integrated Power, Area, and Timing Modeling Framework for Multicore and Manycore Architectures. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture. 469--480.Google ScholarGoogle Scholar
  20. S. Palacharla, N. P. Jouppi, and J. E. Smith. 1996. Quantifying the Complexity of Superscalar Processors. Technical Report CS-TR-1996-1328. University of Wisconsin-Madison.Google ScholarGoogle Scholar
  21. S. Palacharla, N. P. Jouppi, and J. E. Smith. 1997. Complexity-Effective Superscalar Processors. In Proceedings of the 24th Annual International Symposium on Computer Architecture. 206--218.Google ScholarGoogle Scholar
  22. R. P. Preston, R. W. Badeau, D. W. Bailey, S. L. Bell, L. L. Biro, W. J. Bowhill, D. E. Dever, S. Felix, R. Gammack, V. Germini, M. K. Gowan, P. Gronowski, D. B. Jackson, S. Mehta, S. V. Morton, J. D. Pickholtz, M. H. Reilly, and M. J. Smith. 2002. Design of an 8-wide Superscalar RISC Microprocessor with Simultaneous Multithreading. In 2002 IEEE International Solid-State Circuits Conference, Digest of Technical Papers. 334--472.Google ScholarGoogle Scholar
  23. S. Sakai, T. Suenaga, R. Shioya, and H. Ando. 2018. Rearranging Random Issue Queue with High IPC and Short Delay. In Proceedings of the 36th IEEE International Conference on Computer Design. 123--131.Google ScholarGoogle Scholar
  24. P. G. Sassone, J. Rupley II, E. Brekelbaum, G. H. Loh, and B. Black. 2007. Matrix Scheduler Reloaded. In Proceedings of the 34th Annual International Symposium on Computer Architecture. 335--346.Google ScholarGoogle Scholar
  25. J. L. Shin, B. Petrick, M. Singh, and A. S. Leon. 2005. Design and Implementation of an Embedded 512-KB Level-2 Cache Subsystem. IEEE Journal of Solid-State Circuits 40, 9 (September 2005), 1815--1820.Google ScholarGoogle ScholarCross RefCross Ref
  26. B. Sinharoy, J. A. Van Norstrand, R. J. Eickemeyer, H. Q. Le, J. Leenstra, D. Q. Nguyen, B. Konigsburg, K. Ward, M. D. Brown, J. E. Moreira, D. Levitan, S. Tung, D. Hrusecky, J. W. Bishop, M. Gschwind, M. Boersma, M. Kroener, M. Kaltenbacha, T. Karkhanis, and K. M. Fernsler. 2015. IBM POWER8 Processor Core Microarchitecture. IBM Journal of Research and Development 59, issue 1 (January-February 2015), 2:1--2:21.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. J. Stark, M. D. Brown, and Y. N. Patt. 2000. On Pipelining Dynamic Instruction Scheduling Logic. In Proceedings of the 33rd Annual IEEE/ACM International Symposium on Microarchitecture. 57--66.Google ScholarGoogle Scholar
  28. H. Sutter. 2005. The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software. Dr. Dobb's Journal 30, 3 (2005), 202--210.Google ScholarGoogle Scholar
  29. N. H. E. Weste and D. M. Harris. 2011. CMOS VLSI Design: A Circuits and Systems Perspective, fourth edition. Addition Wesley.Google ScholarGoogle Scholar
  30. K. Yamaguchi, Y. Kora, and H. Ando. 2011. Evaluation of Issue Queue Delay: Banking Tag RAM and Identifying Correct Critical Path. In Proceedings of the 29th International Conference on Computer Design. 313--319.Google ScholarGoogle Scholar

Index Terms

  1. SWQUE: A Mode Switching Issue Queue with Priority-Correcting Circular Queue

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      MICRO '52: Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture
      October 2019
      1104 pages
      ISBN:9781450369381
      DOI:10.1145/3352460

      Copyright © 2019 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 12 October 2019

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

      Acceptance Rates

      Overall Acceptance Rate484of2,242submissions,22%

      Upcoming Conference

      MICRO '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader