ABSTRACT
As multicore chips scale to higher processor counts, communication between cores becomes more and more important. Indeed, when a single application is split up among multiple cores, which are connected through a relatively slow network, the amount of communication that is required will have an essential effect on performance. Therefore, if the application can be partitioned in such a way that communication between threads is minimised, or that placement on non-uniform networks can be performed with regards to communication, a significant performance boost can be obtained. But to do this effectively, communication streams inside the application must be known. In this paper, we introduce a profiling tool for Java that can measure data flows between methods. It constructs a communication graph, which combines a traditional call graph with data flow information.
The overhead of profiling is brought down by a factor of 15 through the use of reservoir sampling. We prove that this can be done with a limited decrease in accuracy.
This way, we can quickly estimate communication flows, which forms the critical information that allows an efficient communication-aware parallelisation to be made.
- F. Catthoor, E. de Greef, and S. Wuytack. Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design. Kluwer Academic Publishers, USA, 1998. Google ScholarDigital Library
- K.-H. Li. Reservoir-sampling algorithms of time complexity O(n(1 + log(N/n))). ACM Transactions on Mathematical Software, 20(4):481--493, 1994. Google ScholarDigital Library
- N. Nethercote and A. Mycroft. Redux: A dynamic dataflow tracer. Electronic Notes in Theoretical Computer Science, 89(2):1--22, October 2003.Google ScholarCross Ref
- SPEC JVM Client98 Suite. Industry-standard benchmark for measuring Java Virtual Machine performance. In http://www.spec.org/, USA, 1998.Google Scholar
- J. S. Vitter. Random sampling with a reservoir. ACM Transactions on Mathematical Software, 11(1):37--57, March 1985. Google ScholarDigital Library
- R. E. Walpole and R. H. Myers. Probability and Statistics for Engineers and Scientists. Prentice Hall, 1993.Google Scholar
- J. Zhao. Multithreaded dependence graphs for concurrent java program. In PDSE 1999: Proceedings of the International Symposium on Software Engineering for Parallel and Distributed Systems, page 13, Washington, DC, USA, 1999. IEEE Computer Society. Google ScholarDigital Library
Index Terms
- Efficient measurement of data flow enabling communication-aware parallelisation
Recommendations
A scalable thread scheduling co-processor based on data-flow principles
Large synchronization and communication overhead will become a major concern in future extreme-scale machines (e.g., HPC systems, supercomputers). These systems will push upwards performance limits by adopting chips equipped with one order of magnitude ...
Enabling Profiling For SYCL Applications
IWOCL '18: Proceedings of the International Workshop on OpenCLSince GPGPU devices have become mainstream, more and more software is being written to target many-core devices. Developers are now required to think in parallel in order to run applications with maximum performance, however, the ability to target a ...
Portable and Transparent Host-Device Communication Optimization for GPGPU Environments
CGO '14: Proceedings of Annual IEEE/ACM International Symposium on Code Generation and OptimizationGeneral purpose graphics processors units (GPU) provide the potential for high computational performance with reduced cost and power. Typically they are employed in heterogeneous settings acting as accelerators. Here an application resides on a host ...
Comments