Abstract
Numerical techniques used for describing many-body systems, such as the Coupled Cluster methods (CC) of the quantum chemistry package NWChem, are of extreme interest to the computational chemistry community in fields such as catalytic reactions, solar energy, and bio-mass conversion. In spite of their importance, many of these computationally intensive algorithms have traditionally been thought of in a fairly linear fashion, or are parallelised in coarse chunks.
In this paper, we present our effort of converting the NWChem’s CC code into a dataflow-based form that is capable of utilizing the task scheduling system PaRSEC (Parallel Runtime Scheduling and Execution Controller) – a software package designed to enable high performance computing at scale. We discuss the modularity of our approach and explain how the PaRSEC-enabled dataflow version of the subroutines seamlessly integrate into the NWChem codebase. Furthermore, we argue how the CC algorithms can be easily decomposed into finer grained tasks (compared to the original version of NWChem); and how data distribution and load balancing are decoupled and can be tuned independently. We demonstrate performance acceleration by more than a factor of two in the execution of the entire CC component of NWChem, concluding that the utilization of dataflow-based execution for CC methods enables more efficient and scalable computation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
All subroutines with prefix “icsd_t2_” and suffices: 2_2_2_2(), 2_2_3(), 2_4_2(), 2_5_2(), 2_6(), lt2_3x(), 4_2_2(), 4_3(), 4_4(), 5_2(), 5_3(), 6_2_2(), 6_3(), 7_2(), 7_3(), vt1ic_1_2(), 8(), 2_2_2(), 2_4(), 2_5(), 4_2(), 5(), 6_2(), vt1ic_1, 7(), 2_2(), 4(), 6(), 2().
References
Bartlett, R.J., Musial, M.: Coupled-cluster theory in quantum chemistry. Rev. Mod. Phys. 79(1), 291–352 (2007)
Bosilca, G., Bouteiller, A., Danalis, A., Herault, T., Lemarinier, P., Dongarra, J.: DAGuE: a generic distributed DAG engine for high performance computing. Parallel Comput. 38(12), 37–51 (2012)
Cosnard, M., Loi, M.: Automatic task graph generation techniques. In: Proceedings of the 28th Hawaii International Conference on System Sciences (1995)
Danalis, A., Bosilca, G., Bouteiller, A., Herault, T., Dongarra, J.: PTG: an abstraction for unhindered parallelism. In: Proceedings of International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing (WOLFHPC) (2014)
Deumens, E., Lotrich, V.F., Perera, A., Ponton, M.J., Sanders, B.A., Bartlett, R.J.: Software design of ACES III with the super instruction architecture. Wiley Interdisc. Rev. Comput. Mol. Sci. 1(6), 895–901 (2011)
Hirata, S.: Tensor contraction engine: abstraction and automated parallel implementation of configuration-interaction, coupled-cluster, and many-body perturbation theories. J. Phys. Chem. A 107(46), 9887–9897 (2003)
Kowalski, K., Krishnamoorthy, S., Olson, R., Tipparaju, V., Aprà, E.: Scalable implementations of accurate excited-state coupled cluster theories: application of high-level methods to porphyrin-based systems. In: High Performance Computing, Networking, Storage and Analysis (SC), 2011, pp. 1–10 (2011)
Lai, P.W., Stock, K., Rajbhandari, S., Krishnamoorthy, S., Sadayappan, P.: A framework for load balancing of tensor contraction expressions via dynamic task partitioning. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC), pp. 1–10. ACM (2013)
Lotrich, V., Flocke, N., Ponton, M., Yau, A., Perera, A., Deumens, E., Bartlett, R.: Parallel implementation of electronic structure energy, gradient and hessian calculations. J. Chem. Phys. 128, 194104-1–194104-15 (2008)
McCraw, H., Danalis, A., Herault, T., Bosilca, G., Dongarra, J., Kowalski, K., Windus, T.: Utilizing dataflow-based execution for coupled cluster methods. In: Proceedings of IEEE Cluster 2014, pp. 296–297 (2014)
Nieplocha, J., Palmer, B., Tipparaju, V., Krishnan, M., Trease, H., Apra, E.: Advances, applications and performance of the global arrays shared memory programming toolkit. Int. J. High Perform. Comput. Appl. 20(2), 203–231 (2006)
Ozog, D., Shende, S., Malony, A., Hammond, J., Dinan, J., Balaji, P.: Inspector/executor load balancing algorithms for block-sparse tensor contractions. In: Proceedings of the 27th International ACM Conference on International Conference on Supercomputing, ICS 2013, pp. 483–484. ACM (2013)
Purvis, G., Bartlett, R.: A full coupled-cluster singles and doubles model - the inclusion of disconnected triples. J. Chem. Phys. 76(4), 1910–1918 (1982)
Solomonik, E., Matthews, D., Hammond, J., Demmel, J.: Cyclops tensor framework: reducing communication and eliminating load imbalance in massively parallel contractions. In: 2013 IEEE 27th International Symposium on Parallel Distributed Processing (IPDPS), pp. 813–824 (2013)
Valiev, M., Bylaska, E.J., Govind, N., Kowalski, K., Straatsma, T.P., Van Dam, H.J.J., Wang, D., Nieplocha, J., Aprà, E., Windus, T.L., de Jong, W.: NWChem: a comprehensive and scalable open-source solution for large scale molecular simulations. Comput. Phys. Commun. 181(9), 1477–1489 (2010)
Acknowledgment
This material is based upon work supported in part by the Air Force Office of Scientific Research under AFOSR Award No. FA9550-12-1-0476, and the DOE Office of Science, Advanced Scientific Computing Research, under award No. DE-SC0006733 “SUPER - Institute for Sustained Performance, Energy and Resilience”. A portion of this research was performed using EMSL, a DOE Office of Science User Facility sponsored by the Office of Biological and Environmental Research and located at Pacific Northwest National Laboratory.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Jagode, H., Danalis, A., Bosilca, G., Dongarra, J. (2016). Accelerating NWChem Coupled Cluster Through Dataflow-Based Execution. In: Wyrzykowski, R., Deelman, E., Dongarra, J., Karczewski, K., Kitowski, J., Wiatr, K. (eds) Parallel Processing and Applied Mathematics. PPAM 2015. Lecture Notes in Computer Science(), vol 9573. Springer, Cham. https://doi.org/10.1007/978-3-319-32149-3_35
Download citation
DOI: https://doi.org/10.1007/978-3-319-32149-3_35
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-32148-6
Online ISBN: 978-3-319-32149-3
eBook Packages: Computer ScienceComputer Science (R0)