Abstract
Most latency hiding studies for Distributed Shared Memory (DSM) systems use prefetching, while few explore use of poststoring. This study develops, and compares performance gains obtained using, a runtime poststoring scheme (PST) and an application-specific prefetching scheme (PFH). The PST and PFH schemes produced scalable reductions in loop execution times.
This work was supported in part by an NSF Young Investigator Award CCR-9357840
Chapter PDF
Keywords
- Remote Access
- Local Cache
- Distribute Shared Memory
- Annual International Symposium
- 18th Annual International Symposium
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
J. Kuskin et al., “The Stanford FLASH Multiprocessor,” in Proceedings of the 21st Annual International Symposium on Computer Architecture, pp. 302–313, 1994.
E. Rosti et al., “The KSR1: Experimentation and Modelling of Poststore,” in Proceedings of the 1993 Sigmetrics Conference on Measures and Modelling of Computer Systems, pp. 74–85, 1993.
J. W. C. Fu and J. Patel, “Data Prefetching in Multiprocessor Vector Cache Memories,” in Proceedings of the 18th Annual International Symposium on Computer Architecture, pp. 54–63, 1991.
T. Mowry and A. Gupta, “Tolerating Latency Through Software-Controlled Prefetching in Shared Memory Multiprocessors,” Journal of Parallel and Distributed Computing, vol. 12, pp. 87–106, June 1991.
J. P. Singh, W. Weber, and A. Gupta, “SPLASH: Stanford Parallel Applications for Shared Memory,” Computer Architecture News, vol. 20, pp. 5–44, March 1992.
K. S. R. Corporation, KSR1 Principles of Operations, 1992.
J. Baer and G. R. Sager, “Dynamic Improvement of Locality of Virtual Memory Systems,” IEEE Transactions on Software Engineering, vol. SE-2, pp. 54–62, March 1976.
J. P. Singh, J. L. Hennessy, and A. Gupta, “Implications of Hierarchical N-body Techniques for Multiprocessor Architecture,” Technical Report CSL-TR-92-506, Stanford University, 1992.
C. Tumuluri and A. N. Choudhary, “Exploitation of Latency Hiding on the KSR1, Case Study: The Barnes Hut Algorithm,” Technical Report. CTC94TR176, Cornell Theory Center, 1994.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1996 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tumuluri, C., Choudhary, A.N. (1996). Scalable software latency hiding schemes: Evaluation of the poststore and prefetch options. In: Bougé, L., Fraigniaud, P., Mignotte, A., Robert, Y. (eds) Euro-Par'96 Parallel Processing. Euro-Par 1996. Lecture Notes in Computer Science, vol 1124. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0024740
Download citation
DOI: https://doi.org/10.1007/BFb0024740
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-61627-6
Online ISBN: 978-3-540-70636-6
eBook Packages: Springer Book Archive