ABSTRACT
Achieving high performance on modern systems is challenging. Even with a detailed profile from a performance tool, writing or refactoring a program to remove its performance issues is still a daunting task for application programmers: it demands lots of program optimization expertise that is often system specific.
Vendors often provide some detailed optimization guides to assist programmers in the process. However, these guides are frequently hundreds of pages long, making it difficult for application programmers to master and memorize all the rules and guidelines and properly apply them to a specific problem instance.
In this work, we develop a framework named Egeria to alleviate the difficulty. Through Egeria, one can easily construct an advising tool for a certain high performance computing (HPC) domain (e.g., GPU programming) by providing Egeria with a optimization guide or other related documents for the target domain. An advising tool produced by Egeria provides a concise list of essential rules automatically extracted from the documents. At the same time, the advising tool serves as a question-answer agent that can interactively offers suggestions for specific optimization questions. Egeria is made possible through a distinctive multi-layered design that leverages natural language processing techniques and extends them with knowledge of HPC domains and how to extract information relevant to code optimization Experiments on CUDA, OpenCL, and Xeon Phi programming guides demonstrate, both qualitatively and quantitatively, the usefulness of Egeria for HPC.
- Laksono Adhianto, Sinchan Banerjee, Mike Fagan, Mark Krentel, Gabriel Marin, John Mellor-Crummey, and Nathan R Tallent. 2010. HPCToolkit: Tools for performance analysis of optimized parallel programs. Concurrency and Computation: Practice and Experience 22, 6 (2010), 685--701. Google ScholarCross Ref
- Steven Bird. 2006. NLTK: the natural language toolkit. In Proceedings of the COLING/ACL on Interactive presentation sessions. Association for Computational Linguistics, 69--72. Google ScholarDigital Library
- Bryan R Buck and Jeffrey K Hollingsworth. 2004. Data centric cache measurement on the Intel ltanium 2 processor. In Proceedings of the 2004 ACM/IEEE conference on Supercomputing. IEEE Computer Society, 58. Google ScholarDigital Library
- Xavier Carreras and Lluís Màrquez. 2005. Introduction to the CoNLL-2005 shared task: Semantic role labeling. In Proceedings of the Ninth Conference on Computational Natural Language Learning. Association for Computational Linguistics, 152--164. Google ScholarDigital Library
- Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural language processing (almost) from scratch. Journal of Machine Learning Research 12, Aug (2011), 2493--2537. Google ScholarDigital Library
- Shane Cook. 2012. CUDA programming: a developer's guide to parallel computing with GPUs. Newnes. Google ScholarDigital Library
- Dipanjan Das and André FT Martins. 2007. A survey on automatic text summarization. Literature Survey for the Language and Statistics II course at CMU 4 (2007), 192--195.Google Scholar
- Marie-Catherine De Marneffe and Christopher D Manning. 2008. Stanford typed dependencies manual. Technical Report. Technical report, Stanford University.Google Scholar
- Daniel J Dean, Hiep Nguyen, Peipei Wang, Xiaohui Gu, Anca Sailer, and Andrzej Kochut. 2016. PerfCompass: Online Performance Anomaly Fault Localization and Inference in Infrastructure-as-a-Service Clouds. IEEE Transactions on Parallel and Distributed Systems 27, 6 (2016), 1742--1755.Google ScholarCross Ref
- Paul J Drongowski, AMD Code Analyst Team, and Boston Design Center.2008. An introduction to analysis and optimization with AMD CodeAnalyst Performance Analyzer. Advanced Micro Devices, Inc (2008).Google Scholar
- David Ferrucci, Eric Brown, Jennifer Chu-Carroll, James Fan, David Gondek, Aditya A Kalyanpur, Adam Lally, J William Murdock, Eric Nyberg, John Prager, and others. 2010. Building Watson: An overview of the DeepQA project. AI magazine 31, 3 (2010), 59--79.Google Scholar
- David Ferrucci, Anthony Levas, Sugato Bagchi, David Gondek, and Erik T Mueller. 2013. Watson: beyond jeopardy! Artificial Intelligence 199 (2013), 93--105. Google ScholarDigital Library
- Joseph L Fleiss. 1971. Measuring nominal scale agreement among many raters. Psychological bulletin 76, 5 (1971), 378.Google Scholar
- Susan L Graham, Peter B Kessler, and Marshall K Mckusick. 1982. Gprof: A call graph execution profiler. In ACM Sigplan Notices, Vol. 17. ACM, 120--126. Google ScholarDigital Library
- Sandra Kübler, Ryan McDonald, and Joakim Nivre. 2009. Dependency parsing. Synthesis Lectures on Human Language Technologies 1, 1 (2009), 1--127.Google ScholarCross Ref
- Renaud Lachaize, Baptiste Lepers, and Vivien Quéma. 2012. MemProf: A Memory Profiler for NUMA Multicore Systems. In Presented as part of the 2012 USENIX Annual Technical Conference (USENIX ATC 12). 53--64. Google ScholarDigital Library
- John Levon and Philippe Elie. 2004. Oprofile: A system profiler for linux. (2004).Google Scholar
- Xu Liu and John Mellor-Crummey. 2011. Pinpointing data locality problems using data-centric analysis. In Code Generation and Optimization (CGO), 2011 9th Annual IEEE/ACM International Symposium on. IEEE, 171--180. Google ScholarDigital Library
- Xu Liu and John Mellor-Crummey. 2013. Pinpointing data locality bottlenecks with low overhead. In Performance Analysis of Systems and Software (ISPASS), 2013 IEEE International Symposium on. IEEE, 183--193.Google ScholarCross Ref
- Senthil Mani, Rose Catherine, Vibha Singhal Sinha, and Avinava Dubey. 2012. Ausum: approach for unsupervised bug report summarization. In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering. ACM, 11. Google ScholarDigital Library
- Christopher D Manning, Mihai Surdeanu, John Bauer, Jenny Rose Finkel, Steven Bethard, and David McClosky. 2014. The Stanford CoreNLP Natural Language Processing Toolkit.. In ACL (System Demonstrations). 55--60.Google Scholar
- Collin McCurdy and Jeffrey Vetter. 2010. Memphis: Finding and fixing NUMA-related performance problems on multi-core platforms. In Performance Analysis of Systems & Software (ISPASS), 2010 IEEE International Symposium on. IEEE, 87--96.Google ScholarCross Ref
- Aaftab Munshi, Benedict Gaster, Timothy G Mattson, and Dan Ginsburg. 2011. OpenCL programming guide. Pearson Education. Google ScholarDigital Library
- CUDA NVidia. 2014. CUDA Profiler Users Guide (Version 6.5): NVIDIA. Santa Clara, CA, USA (2014), 87.Google Scholar
- Martha Palmer, Daniel Gildea, and Paul Kingsbury. 2005. The proposition bank: An annotated corpus of semantic roles. Computational linguistics 31, 1 (2005), 71--106. Google ScholarDigital Library
- V. Punyakanok, D. Roth, and W. Yih. 2008. The Importance of Syntactic Parsing and Inference in Semantic Role Labeling. Computational Linguistics 34, 2 (2008). http://cogcomp.cs.illinois.edu/papers/PunyakanokRoYi07.pdf Google ScholarDigital Library
- Inc. Qualcomm Technologies. 2016. Qualcomm Snapdragon Profiler Quick Start Guide. (2016).Google Scholar
- Radim Rehůřek and Petr Sojka. 2010. Software Framework for Topic Modelling with Large Corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks. ELRA, Valletta, Malta, 45--50.Google Scholar
- James Reinders. 2005. VTune performance analyzer essentials. Intel Press.Google Scholar
- Michael Roth and Mirella Lapata. 2016. Neural Semantic Role Labeling with Dependency Path Embeddings. CoRR abs/1605.07515 (2016). http://arxiv.org/abs/1605.07515Google Scholar
- AMD Developer Tools Team. 2013. CodeXL Quick Start Guide. (2013). Retrieved Dec. 14, 2016 fromhttp://developer.amd.com/tools-and-sdks/opencl-zone/codexlGoogle Scholar
- AMD Developer Tools Team. 2016. GPU PerfStudio. (2016). Retrieved Dec. 14, 2016 from http://developer.amd.com/tools-and-sdks/graphics-development/gpu-perfstudioGoogle Scholar
- Yuan Tian, David Lo, and Chengnian Sun. 2012. Information retrieval based nearest neighbor classification for fine-grained bug severity prediction. In 2012 19th Working Conference on Reverse Engineering. IEEE, 215--224. Google ScholarDigital Library
- Peter D Turney, Patrick Pantel, and others. 2010. From frequency to meaning: Vector space models of semantics. Journal of artificial intelligence research 37, 1 (2010), 141--188. Google ScholarCross Ref
- Xin Ye, Razvan Bunescu, and Chang Liu. 2014. Learning to rank relevant files for bug reports using domain knowledge. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, 689--699. Google ScholarDigital Library
- Yu Zhou, Yanxiang Tong, Ruihang Gu, and Harald Gall. 2016. Combining text mining and data mining for bug report classification. Journal of Software: Evolution and Process (2016). Google ScholarDigital Library
Index Terms
- Egeria: a framework for automatic synthesis of HPC advising tools through multi-layered natural language processing
Recommendations
Evaluation of Rodinia Codes on Intel Xeon Phi
ISMS '13: Proceedings of the 2013 4th International Conference on Intelligent Systems, Modelling and SimulationHigh performance computing (HPC) is a niche area where various parallel benchmarks are constantly used to explore and evaluate the performance of Heterogeneous computing systems on the horizon. The Rodinia benchmark suite, a collection of parallel ...
MIC acceleration of short-range molecular dynamics simulations
COSMIC '13: Proceedings of the First International Workshop on Code OptimiSation for MultI and many CoresHeterogeneous systems containing accelerators such as GPUs or co-processors such as Intel MIC are becoming more prevalent due to their ability of exploiting large-scale parallelism in applications. In this paper, we have developed a hierarchical ...
An optimized large-scale hybrid DGEMM design for CPUs and ATI GPUs
ICS '12: Proceedings of the 26th ACM international conference on SupercomputingIn heterogeneous systems that include CPUs and GPUs, the data transfers between these components play a critical role in determining the performance of applications. Software pipelining is a common approach to mitigate the overheads of those transfers. ...
Comments