Skip to main content

Spinning-on-coherency: A new VSM optimisation for write-invalidate

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1067))

Abstract

This paper introduces spinning-on-coherency (SOC) a technique for virtual shared memory (VSM) which enables latency-hiding of remote reads and the removal of related synchronisation points. Coherence-bits are hardware-tags associated with addresses which record local access permissions (such as read, write, invalid). In SOC a user-thread spins on the particular coherence-bits associated with an address until the new data value is asynchronously propagated and the address becomes valid. Data-propagation occurs when another node issues an update after having written the new value. Performance improvements are demonstrated for two codes, representing the core communication found in Shallow (a well known numerical weather prediction benchmark), and CG (from the NAS Parallel Benchmarks). These are run on a 30 node prototype distributed memory architecture (EDS), with invalidation based sequentially consistent VSM. SOC is also applicable to other consistency models and directory schemes, whether in hardware or software and complements other VSM optimisations. Currently such optimisation is performed by the programmer, but there is much scope for automating this process within a compiler.

This work was funded by the U.K. Meteorological Office and the ESPRIT SODA project.

This is a preview of subscription content, log in via an institution.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. D. Bailey, J. Barton, T. Lasinski, and H. Simon. The nas parallel benchmarks. NASA Technical Memorandum 103863, 1993.

    Google Scholar 

  2. B.N.Bershad and M.J.Zekauskas. Midway: Shared memory parallel programming with entry consistency for distributed memory multiprocessors. Technical Report CMU-CS-91-170, School of Computer Science, Carnegie Mellon University, Pitsburgh, PA 15213, 1991.

    Google Scholar 

  3. F. Bodin and M.F.P. OBoyle. A compiler strategy for svm. In 3rd Workshop on Languages, Compilers and Runtime Systems for Scalable Computing. Kluwer Press, May 1995.

    Google Scholar 

  4. A.L. Cox and R.J. Fowler. Adaptive cache coherence for detecting migratory shared data. In Proc. of the 20th International Symposium on Computer Architecture, pp 98–108, 1993.

    Google Scholar 

  5. B. Falsafi et al. Application-specific protocols for user-level shared memory. In Supercomputing 94. IEEE Press, 1994.

    Google Scholar 

  6. R.W. Ford, A.P. Nisbet, and J.M. Bull. User level vsm optimisation and its application. In Lecture Notes in Computer Science. 1041, pp 223–232, Springer-Verlag, 1996.

    Google Scholar 

  7. Burkhardt III H. Frank S. and Rothnie J. The KSR1: Bridging the gap between shared memory and mpps. In Proceedings of Compcon 93, pages 285–294, San Francisco, 1993.

    Google Scholar 

  8. D.B. Glasco, A. Delagi, and M.J. Flynn. The impact of cache coherence protocols on systems using fine-grain data synchronisation. In IFIP Transactions, Parallel Architectures and Compilation Techniques, PACT94. North Holland, 1994.

    Google Scholar 

  9. K.Gharachorloo, D.Lenoski, J.Jaudon, P.Gibons, A.Gupta, and J.Hennessy. Memory consistency and event ordering in scaleable shared memory multiprocessors. In Proceedings of the 17th International Symposium on Computer Architecture, pages 15–26, 1990.

    Google Scholar 

  10. A.R. Lebeck and D.A. Wood. Dynamic self-invalidation: Reducing coherence overhead in shared-memory multiprocessors. In ISCA95, pages 48–59, 1995.

    Google Scholar 

  11. D. Lenoski, J. Landon, K. Gharachorloo, A. Gupta, and J.Henessy. The directory-based cache coherence protocol for the dash multiprocessor. In IEEE 17th Annual International Symposium on Computer Architecture. IEEE Press, 1990.

    Google Scholar 

  12. K. Li and P. Hudak. Memory coherence in shared virtual memory systems. ACM Transactions on Computer Systems, 7(4):321–359, 1989.

    Google Scholar 

  13. R. Mirchandaney, S. Hirandani, and A. Sethi. Improving the performance of dsm systems via compiler involvement. In Proceedings of Supercomputing, 1994.

    Google Scholar 

  14. D. Mosberger. Memory consistency models. ACM SIGOPs Review, 27(1), 1993.

    Google Scholar 

  15. F. Mounes-Toussi and D.J. Lilja. The potential of compile-time analysis to adapt the cache coherence enforcement strategy to the data sharing characteristics. IEEE Transactions on Parallel and Distributed Systems, 6(5), May 1995.

    Google Scholar 

  16. S.S. Mukherjeee, S.D.Sharma, M.D. Hill, J.R.Larus, A.Rodgers, and J.Saltz. Efficient support for irregular applications on distributed-memory machines. In Proceedings of the 5th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 1995.

    Google Scholar 

  17. M.F.P. O'Boyle, R.W. ford, and A.P. Nisbet. Compiler reduction of invalidation traffic in shared virtual memory systems, in preparation, 1995.

    Google Scholar 

  18. S.K. Reinhardt, J.R. Larus, and D.A. Wood. Tempest and typhoon: User-level shared memory. In Proc. of the 21st Annual International Symposium on Computer Architecture, 1994.

    Google Scholar 

  19. J.H. Saltz, R.Mirchandaney, and K.Crowley. Run-time parallelisation and scheduling of loops. IEEE Transactions on Computers, 40(5), May 1991.

    Google Scholar 

  20. C.J. Skelton et al. Eds a parallel computer system for advanced inoformation processing. In Parallel Architectures and Languages Europe, PARLE92, pages 3–18, 1992.

    Google Scholar 

  21. P. Stenstrom, M. Brosson, and L.Sandberg. Adaptive cache coherence protocol optimized for migratory sharing. In Proc. 20th Intl. Symp. on Computer Architecture, pp 109–118, 1993.

    Google Scholar 

  22. P.N. Swartzrauber. The shallow benchmark weather prediction program. Technical report, National Center for Atmospheric Research, Boulder, Colorado, 1984.

    Google Scholar 

  23. T.Mowry and A.Gupta. Tolerating latency through software-controlled prefetching in sharedmemory multiprocessors. Journal of Parallel and Distributed Computing, 12(2), June 1991.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Heather Liddell Adrian Colbrook Bob Hertzberger Peter Sloot

Rights and permissions

Reprints and permissions

Copyright information

© 1996 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Nisbet, A.P., Ford, R.W. (1996). Spinning-on-coherency: A new VSM optimisation for write-invalidate. In: Liddell, H., Colbrook, A., Hertzberger, B., Sloot, P. (eds) High-Performance Computing and Networking. HPCN-Europe 1996. Lecture Notes in Computer Science, vol 1067. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-61142-8_628

Download citation

  • DOI: https://doi.org/10.1007/3-540-61142-8_628

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-61142-4

  • Online ISBN: 978-3-540-49955-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics