Abstract
Cache optimizations use code transformations to increase the locality of memory accesses and use prefetching techniques to hide latency. For best performance, hardware prefetching units of processors should be complemented with software prefetch instructions. A cache simulation enhanced with a hardware prefetcher is presented to run code for a 3D multigrid solver. Thus, cache misses not predicted can be handled via insertion of prefetch instructions. Additionally, Interleaved Block Prefetching (IBPF), is presented. Measurements show its potential.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bekerman, M., Jourdan, S., Romen, R., Kirshenboim, G., Rappoport, L., Yoaz, A., Weiser, U.: Correlated Load-Address Predictors. In: Proceedings of the 26th International Symposium on Computer Architecture, May 1999, pp. 54–63 (1999)
Berg, E., Hagersten, E.: SIP: Performance Tuning through Source Code Interdependence. In: Proceedings of the 8th International Euro-Par Conference (Euro-Par 2002), Paderborn, Germany, August 2002, pp. 177–186 (2002)
Berg, S.G.: Cache prefetching. Technical Report UW-CSE 02-02-04, University of Washington (February 2002)
Beyls, K., D’Hollander, E.H.: Platform-Independent Cache Optimization by Pinpointing Low-Locality Reuse. In: Proceedings of International Conference on Computational Science, June 2004, vol. 3, pp. 463–470 (2004)
Brandes, T.: Adaptor. homepage, http://www.scai.fraunhofer.de/291.0.html
Buck, B., Hollingsworth, J.K.: An API for Runtime Code Patching. The International Journal of High Performance Computing Applications 14, 317–329 (2000)
DeRose, L., Ekanadham, K., Hollingsworth, J.K., Sbaraglia, S.: SIGMA: A Simulator Infrastructure to Guide Memory Analysis. In: Proceedings of SC 2002, Baltimore, MD (November 2002)
Dynaprof Homepage, http://www.cs.utk.edu/mucci/dynaprof
Hsiao, H.C., King, C.T.: MICA: A Memory and Interconnect Simulation Environment for Cache-based Architectures. In: Proceedings of the 33rd IEEE Annual Simulation Symposium (SS 2000), April 2000, pp. 317–325 (2000)
Intel Corporation. IA-32 Intel Architecture: Software Developers Manual
Kowarschik, M., Rüde, U., Thürey, N., Weiß, C.: Performance Optimization of 3DMultigrid on Hierarchical Memory Architectures. In: Fagerholm, J., Haataja, J., Järvinen, J., Lyly, M., Råback, P., Savolainen, V. (eds.) PARA 2002. LNCS, vol. 2367, pp. 307–316. Springer, Heidelberg (2002)
Kowarschik, M., Weiß, C.: An Overview of Cache Optimization Techniques and Cache- Aware Numerical Algorithms. In: Meyer, U., Sanders, P., Sibeyn, J.F. (eds.) Algorithms for Memory Hierarchies. LNCS, vol. 2625, pp. 213–232. Springer, Heidelberg (2003)
Levon, J.: OProfile, a system-wide profiler for Linux systems, Homepage: http://oprofile.sourceforge.net
Martonosi, M., Gupta, A., Anderson, T.E.: Memspy:Analyzingmemory system bottlenecks in programs. In: Measurement and Modeling of Computer Systems, pp. 1–12 (1992)
Nethercote, N., Seward, J.: Valgrind: A Program Supervision Framework. In: Proceedings of the Third Workshop on Runtime Verification (RV 2003), Boulder, Colorado, USA (July 2003), Available at http://developer.kde.org/~sewardj
Pai, V.S., Ranganathan, P., Adve, S.V., Harton, T.: An Evaluation of Memory Consistency Models for Shared-Memory Systems with ILP Processors. In: Proceedings of the Seventh International Conference on Architectural Support for Programming Languages and Operating Systems, October 1996, pp. 12–23 (1996)
Thürey, N.: Cache Optimizations for Multigrid in 3D. Lehrstuhl für Informatik 10 (Systemsimulation), Institut für Informatik, University of Erlangen-Nuremberg, Germany (June 2002) Studienarbeit
Weidendorfer, J., Kowarschik, M., Trinitis, C.: A Tool Suite for Simulation Based Analysis of Memory Access Behavior. In: Proceedings of International Conference on Computational Science, June 2004, vol. 3, pp. 455–462 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Weidendorfer, J., Trinitis, C. (2006). Cache Optimizations for Iterative Numerical Codes Aware of Hardware Prefetching. In: Dongarra, J., Madsen, K., Waśniewski, J. (eds) Applied Parallel Computing. State of the Art in Scientific Computing. PARA 2004. Lecture Notes in Computer Science, vol 3732. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11558958_111
Download citation
DOI: https://doi.org/10.1007/11558958_111
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29067-4
Online ISBN: 978-3-540-33498-9
eBook Packages: Computer ScienceComputer Science (R0)