Skip to main content
Log in

Active Store Window: Enabling Far Store-Load Forwarding with Scalability and Complexity-Efficiency

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Conventional dynamically scheduled processors often use fully associative structures named load/store queue (LSQ) to implement the value communication between loads and the older in-flight stores and to detect the store-load order violation. But this in-flight forwarding only occupies about 15% of all store-load communications, which makes the CAM-based micro-architecture the major bottleneck to scale store-load communication further. This paper presents a new micro-architecture named ASW (short for active store window). It provides a new structure named speculative active store window to implement more aggressively speculative store-load forwarding than conventional LSQ. This structure could forward the data of committed stores to the executing loads without accessing to L1 data cache, which is referred to as far forwarding in this paper. At the back-end of the pipeline, it uses in-order load re-execution filtered by the tagged SSBF (short for store sequence bloom filter) to verify the correctness of the store-load forwarding. The speculative active store window and tagged store sequence bloom filter are all set-associate structures that are more efficient and scalable than fully associative structures. Experiments show that this simpler and faster design outperforms a conventional load/store queue based design and the NoSQ design on most benchmarks by 10.22% and 8.71% respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Wulf W A, McKee S A. Hitting the memory wall: Implications of the obvious. Computer Architecture News, 1995, 23(1): 20–24.

    Article  Google Scholar 

  2. Park I, Ooi C L, Vijaykumar T N. Reducing design complexity of the load/store queue. In Proc. the 36th MICRO, San Diego, USA, Dec. 3-5, 2003, pp.411–422.

  3. Gandhi A, Akkary H, Rajwar R, Srinivasan S T, Lai K. Scalable load and store processing in latency tolerant processors. In Proc. the 32nd ISCA, Madison, USA, June 4-8, 2005, pp.446–457.

  4. Pericàs M, Cristal A, Cazorla F J, Gonzàlez R, Veidenbaum A, Jimènez D A, ValeroM. A two-level load/store queue based on execution locality. In Proc. the 35th ISCA, Beijing, China, June 21-25, 2008, pp.25–36.

  5. Sethumadhavan S, Desikan R, Burger D, Moore C R, Keckler S W. Scalable hardware memory disambiguation for high ILP processors. In Proc. the 36th MICRO, San Diego, USA, Dec. 3-5, 2003, pp.399–410.

  6. Baugh L, Zilles C. Decomposing the load-store queue by function for power reduction and scalability. IBM Journal of Research and Development, 2006, 50(2/3): 287–297.

    Article  Google Scholar 

  7. Sha T T, Martin M M K, Roth A. Scalable store-load forwarding via store queue index prediction. In Proc. the 38th MICRO, Barcelona, Spain, Nov. 12-16, 2005, pp.159–170.

  8. Stone S S, Woley K M, Frank M I. Address-indexed memory disambiguation and store-to-load forwarding. In Proc. the 38th MICRO, Barcelona, Spain, Nov. 12-16, 2005, pp.171–182.

  9. Roesner F, Burger D, Keckler S W. Counting dependence predictors. In Proc. the 35th ISCA, Beijing, China, June 21-25, 2008, pp.215–226.

  10. Sha T T, Martin M M K, Roth A. NoSQ: Store-load communication without a store queue. In Proc. the 39th MICRO, Orlando, USA, Dec. 9-13, 2006, pp.285–296.

  11. Subramaniam S, Loh G H. Fire-and-forget: Load/store scheduling with no store queue at all. In Proc. the 39th MICRO, Orlando, USA, Dec. 9-13, 2006, pp.273–284.

  12. Garg A, Rashid M W, Huang M. Slackened memory dependence enforcement: Combining opportunistic forwarding with decoupled verification. In Proc. the 33rd ISCA, Boston, USA, June 17-21, 2006, pp.142–154.

  13. Sethumadhavan S, Roesner F, Emer J S, Burger D, Keckler S W. Late-binding: Enabling unordered load-store queue. In Proc. the 34th ISCA, San Diego, USA, June 9-13, 2007, pp.347–357.

  14. Huang R, Garg A, Huang M. Software hardware cooperative memory disambiguation. In Proc. the 12th HPCA, Austin, USA, Feb. 11-15, 2006, pp.244–253.

  15. Cain H W, Lipasti M H. Memory ordering: A value-based approach. In Proc. the 31st ISCA, München, Germany, June 19-23, 2004, pp.90–101.

  16. Roth A. Store vulnerability window: Re-execution filtering for enhanced load optimization. In Proc. the 32nd ISCA, Madison, USA, June 4-8, 2005, pp.458–468.

  17. Chrysos G Z, Emer J S. Memory dependence prediction using store sets. In Proc. the 25th ISCA, Barcelona, Spain, June 27-July 1, 1998, pp.142–153.

  18. Moshovos A, Breach S E, Vijaykumar T N, Sohi G S. Dynamic speculation and synchronization of data dependences. In Proc. the 24th ISCA, Denver, USA, June 2-4, 1997, pp.181–193.

  19. Hilton A, Roth A. Decoupled store completion/silent deterministic replay: Enabling scalable data memory for CPR/CFP processors. In Proc. the 36th ISCA, Austin, USA, June 20-24, 2009, pp.245–254.

  20. Hilton A, Roth A. BOLT: Energy-efficient out-of-order latency-tolerant execution. In Proc. the 16th HPCA, Bangalore, India, Jan. 9-14, 2010, pp.1–12.

  21. Mutlu O, Stark J, Wilkerson C, Patt Y N. Runahead execution: An alternative to very large instruction windows for out-of-order processors. In Proc. the 9th HPCA, Anaheim, USA, Feb. 8-12, 2003, pp.129–140.

  22. Akkary H, Rajwar R, Srinivasan S T. Checkpoint processing and recovery: Towards scalable large instruction window processors. In Proc. the 36th MICRO, San Diego, USA, Dec. 3-5, 2003, pp.423–434.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiao-Yin Wang.

Additional information

This work was supported by the National High Technology Research and Development 863 Program of China under Grant No. 2009ZX01029-001-002 and the Postdoctoral Science Foundation of China under Grant No. 20110490208.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, ZH., Wang, XY., Tong, D. et al. Active Store Window: Enabling Far Store-Load Forwarding with Scalability and Complexity-Efficiency. J. Comput. Sci. Technol. 27, 769–780 (2012). https://doi.org/10.1007/s11390-012-1263-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-012-1263-7

Keywords

Navigation