Skip to main content

Exploring Strategies to Improve Locality Across Many-Core Affinities

  • Conference paper
  • First Online:
Book cover Euro-Par 2021: Parallel Processing Workshops (Euro-Par 2021)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13098))

Included in the following conference series:

  • 657 Accesses

Abstract

Several recent rank one systems in the Top500 include many-core chips with complex memory systems, including intermediate levels of memory, multiple memory channels, and explicit affinity of specific memory channels to specific sub-blocks of cores. Creating codes to utilize these features efficiently is thus a significant challenge. This paper uses Intel’s Knights Landing (KNL) processor as a testbed, as it includes both intermediate memory and multiple architectural knobs to adjust affinity. This paper also uses a 2D Fast Fourier Transform (FFT) as a test case to explore what combination of architectural and algorithmic techniques are of most benefit. Several codes are used, including state-of-the-art FFT codes FFTW and MKL, along with two additional simple parallel 2D FFT codes exploring explicit options. The conclusions are that intermediate memory does provide a significant boost, that there are architectural modes in the memory subsystem that are better suited to FFT than others.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Al-Hothali, S.: Snoopy and directory based cache coherence protocols: a critical analysis. J. Inf. Commun. Technol. (JICT) 4(1), 11 (2010)

    Google Scholar 

  2. Barve, R.D., Vitter, J.S.: A theoretical framework for memory-adaptive algorithms. In: 40th Symposium on Foundations of Computer Science (Cat. No. 99CB37039), pp. 273–284. IEEE (1999)

    Google Scholar 

  3. Bender, M.A., Chowdhury, R.A., Das, R., et al.: Closing the gap between cache-oblivious and cache-adaptive analysis. In: Proceedings of the 32nd ACM Symposium on Parallelism in Algorithms and Architectures, pp. 63–73 (2020)

    Google Scholar 

  4. Bender, M.A., Demaine, E.D., Ebrahimi, R., et al.: Cache-adaptive analysis. In: Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures, pp. 135–144 (2016)

    Google Scholar 

  5. Bender, M.A., Ebrahimi, R., Fineman, J.T., Ghasemiesfeh, G., Johnson, R., McCauley, S.: Cache-adaptive algorithms. In: Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 958–971. SIAM (2014)

    Google Scholar 

  6. Blelloch, G.E., Gibbons, P.B., Simhadri, H.V.: Low depth cache-oblivious algorithms. In: Proceedings of the 22nd ACM Symposium on Parallelism in Algorithms and Architectures, pp. 189–199 (2010)

    Google Scholar 

  7. Caheny, P., Casas, M., Moretó, M., et al.: Reducing cache coherence traffic with hierarchical directory cache and numa-aware runtime scheduling. In: 2016 International Conference on Parallel Architecture and Compilation Techniques (PACT), pp. 275–286. IEEE (2016)

    Google Scholar 

  8. Chaiken, D., Fields, C., Kurihara, K., Agarwal, A.: Directory-based cache coherence in large-scale multiprocessors. Computer 23(6), 49–58 (1990)

    Article  Google Scholar 

  9. Chowdhury, R.A., Ramachandran, V., Silvestri, F., Blakeley, B.: Oblivious algorithms for multicores and networks of processors. J. Parallel Distrib. Comput. 73(7), 911–925 (2013)

    Article  Google Scholar 

  10. Denoyelle, N., Goglin, B., Ilic, A., Jeannot, E., Sousa, L.: Modeling large compute nodes with heterogeneous memories with cache-aware roofline model. In: Jarvis, S., Wright, S., Hammond, S. (eds.) PMBS 2017. LNCS, vol. 10724, pp. 91–113. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-72971-8_5

    Chapter  Google Scholar 

  11. Frigo, M., Johnson, S.G.: The design and implementation of FFTW3. Proc. IEEE 93(2), 216–231 (2005)

    Article  Google Scholar 

  12. Frigo, M., Leiserson, C.E., Prokop, H., Ramachandran, S.: Cache-oblivious algorithms. In: Proceedings of the 40th Annual Symposium on Foundations of Computer Science, p. 285 (1999)

    Google Scholar 

  13. León, E.A., Hautreux, M.: Achieving transparency mapping parallel applications: a memory hierarchy affair. In: Proceedings International Symposium on Memory Systems, pp. 185–189 (2018)

    Google Scholar 

  14. Popovici, D.T., Low, T.M., Franchetti, F.: Large bandwidth-efficient FFTs on multicore and multi-socket systems. In: 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 379–388. IEEE (2018)

    Google Scholar 

  15. Rockmore, D.N.: The FFT: an algorithm the whole family can use. Comput. Sci. Eng. 2(1), 60–64 (2000)

    Article  Google Scholar 

  16. Weinberg, V.: PRACE Autumn School 2016-Intel Xeon Phi Programming (2016)

    Google Scholar 

  17. Yotov, K., Roeder, T., Pingali, K., et al.: An experimental comparison of cache-oblivious and cache-conscious programs. In: Proceedings of the 19th ACM Symposium on Parallel Algorithms and Architectures, pp. 93–104 (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Neil Butcher or Peter Kogge .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Butcher, N., Kogge, P. (2022). Exploring Strategies to Improve Locality Across Many-Core Affinities. In: Chaves, R., et al. Euro-Par 2021: Parallel Processing Workshops. Euro-Par 2021. Lecture Notes in Computer Science, vol 13098. Springer, Cham. https://doi.org/10.1007/978-3-031-06156-1_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-06156-1_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-06155-4

  • Online ISBN: 978-3-031-06156-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics