Abstract
Today’s trend to use accelerators in heterogeneous systems forces a paradigm shift in programming models. The use of low-level APIs for accelerator programming is tedious and not intuitive for casual programmers. To tackle this problem, recent approaches focused on high-level directive-based models, with a standardization effort made with OpenACC and the directives for accelerator on OpenMP 4.0 release candidate. The pragmas for data management induce some coherence issues in the accelerator memory for code correctnesse. To address this issue, we propose the design for a directory, along with a reduced runtime ABI, to handle correctly data management in these standards. Our design fits a multi-accelerator system. Also, with our directory, we propose a way to handle correctly pragmas on partially overlapping data intervals.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Archibald, J., Baer, J.L.: An economical solution to the cache coherence problem. In: 11th ISCA, pp. 355–362. ACM, New York (1984)
Augonnet, C., Namyst, R.: A unified runtime system for heterogeneous multi-core architectures. In: César, E., et al. (eds.) 14th Euro-Par Workshops. LNCS, vol. 5415, pp. 174–183. Springer, Heidelberg (2009)
Cantin, J.F., Lipasti, M.H., Smith, J.E.: Improving multiprocessor performance with coarse-grain coherence tracking. In: 32nd ISCA, pp. 246–257. IEEE Computer Society, Washington, DC (2005)
Censier, L.M., Feautrier, P.: A new solution to coherence problems in multicache systems. IEEE Transactions on Computers C-27(12), 1112–1118 (1978)
Chaiken, D., Fields, C., Kurihara, K., Agarwal, A.: Directory-based cache coherence in large-scale multiprocessors. Computer 23(6), 49–58 (1990)
Chen, G.: Slid - a cost-effective and scalable limited-directory scheme for cache coherence. In: Reeve, M., Bode, A., Wolf, G. (eds.) PARLE 1993. LNCS, vol. 694, pp. 341–352. Springer, Heidelberg (1993)
Cuesta, B., Ros, A., Gomez, M.E., Robles, A., Duato, J.: Increasing the effectiveness of directory caches by deactivating coherence for private memory blocks. In: 38th ISCA, pp. 93–103 (2011)
Gautier, T., Lima, J.V.F., Maillard, N., Raffin, B.: Xkaapi: A runtime system for data-flow task programming on heterogeneous architectures. In: 27th IPDPS, pp. 1299–1308. IEEE Computer Society, Washington, DC (2013)
Levesque, J.M., Sankaran, R., Grout, R.: Hybridizing s3d into an exascale application using openacc: An approach for moving to multi-petaflops and beyond. In: SuperComputing 2012, pp. 1–11 (2012)
Lionetti, F.V., McCulloch, A.D., Baden, S.B.: Source-to-source optimization of cuda c for gpu accelerated cardiac cell modeling. In: D’Ambra, P., Guarracino, M., Talia, D. (eds.) Euro-Par 2010, Part I. LNCS, vol. 6271, pp. 38–49. Springer, Heidelberg (2010)
Moshovos, A.: Regionscout: Exploiting coarse grain sharing in snoop-based coherence. In: 32nd ISCA, pp. 234–245. IEEE Computer Society, Washington, DC (2005)
Planas, J., Badia, R.M., Ayguadé, E., Labarta, J.: Hierarchical task-based programming with starss. Int. J. High Perform. Comput. Appl. 23(3), 284–299 (2009)
Quintana-Ortí, G., Igual, F.D., Quintana-Ortí, E.S., van de Geijn, R.A.: Solving dense linear systems on platforms with multiple hardware accelerators. In: 14th PPoPP, pp. 121–130. ACM, New York (2009)
Reyes, R., López-Rodríguez, I., Fumero, J.J., de Sande, F.: accULL: An openACC implementation with CUDA and openCL support. In: Kaklamanis, C., Papatheodorou, T., Spirakis, P.G. (eds.) Euro-Par 2012. LNCS, vol. 7484, pp. 871–882. Springer, Heidelberg (2012)
Silberstein, M., Schuster, A., Geiger, D., Patney, A., Owens, J.D.: Efficient computation of sum-products on gpus through software-managed cache. In: 22nd ICS, pp. 309–318. ACM, New York (2008)
Tang, C.K.: Cache system design in the tightly coupled multiprocessor system. In: Proceedings of National Computer Conference and Exposition 1976, AFIPS 1976, pp. 749–753. ACM, New York (1976)
Wienke, S., Springer, P., Terboven, C., an Mey, D.: OpenACC — first experiences with real-world applications. In: Kaklamanis, C., Papatheodorou, T., Spirakis, P.G. (eds.) Euro-Par 2012. LNCS, vol. 7484, pp. 859–870. Springer, Heidelberg (2012)
Zhao, H., Shriraman, A., Dwarkadas, S., Srinivasan, V.: Spatl: Honey, i shrunk the coherence directory. In: 20th PACT, pp. 33–44 (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jaeger, J., Carribault, P., Pérache, M. (2014). Data-Management Directory for OpenMP 4.0 and OpenACC. In: an Mey, D., et al. Euro-Par 2013: Parallel Processing Workshops. Euro-Par 2013. Lecture Notes in Computer Science, vol 8374. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54420-0_17
Download citation
DOI: https://doi.org/10.1007/978-3-642-54420-0_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-54419-4
Online ISBN: 978-3-642-54420-0
eBook Packages: Computer ScienceComputer Science (R0)