ABSTRACT
We present Dubstep, a novel system that uses the find-transform-navigate paradigm to automatically explore new parallelization opportunities in already parallelized (fully-synchronized) programs by opportunistically relaxing synchronization primitives. This set of transformations generates a space of alternative, possibly non-deterministic, parallel programs with varying performance and accuracy characteristics. The freedom to generate parallel programs whose output may differ (within statistical accuracy bounds) from the output of the original program enables a significantly larger optimization space. Dubstep then searches this space to find a parallel program that will, with high likelihood, produce outputs that are acceptably close to the outputs that the original, fully synchronized parallel program would have produced.
Initial results from our benchmarked application show that Dubstep can generate acceptably accurate and efficient versions of a parallel program that occupy different positions in performance/accuracy trade off space.
- J. Ansel, C. Chan, Y. Wong, M. Olszewski, Q. Zhao, A. Edelman, and S. Amarasinghe. Petabricks: A language and compiler for algorithmic choice. PLDI, 2009. Google ScholarDigital Library
- W. Baek and T. Chilimbi. Green: A framework for supporting energy-conscious programming using controlled approximation. PLDI, 2010. Google ScholarDigital Library
- M. Berry, D. Chen, P. Koss, D. Kuck, S. Lo, Y. Pang, L. Pointer, R. Roloff, A. Sameh, E. Clementi, et al. The perfect club benchmarks: Effective performance evaluation of supercomputers. International Journal of High Performance Computing Applications, 3(3):5--40, 1989.Google ScholarDigital Library
- M. Carbin, D. Kim, S. Misailovic, and M. Rinard. Proving acceptability properties of relaxed nondeterministic approximate programs. PLDI, 2012. Google ScholarDigital Library
- M. Carbin and M. Rinard. Automatically Identifying Critical Input Regions and Code in Applications. ISSTA, 2010. Google ScholarDigital Library
- S. Chaudhuri, S. Gulwani, R. Lublinerman, and S. Navidpour. Proving Programs Robust. FSE, 2011. Google ScholarDigital Library
- C. Ding, X. Shen, K. Kelsey, C. Tice, R. Huang, and C. Zhang. Software behavior oriented parallelization. PLDI, 2007. Google ScholarDigital Library
- W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58(301), 1963.Google ScholarCross Ref
- H. Hoffmann, S. Sidiroglou, M. Carbin, S. Misailovic, A. Agarwal, and M. Rinard. Dynamic knobs for responsive power-aware computing. ASPLOS, 2011. Google ScholarDigital Library
- C. Kirsch, H. Payer, H. Röck, and A. Sokolova. Performance, scalability, and semantics of concurrent FIFO queues. PODC, 2011.Google Scholar
- J. Meng, S. Chakradhar, and A. Raghunathan. Best-Effort Parallel Execution Framework for Recognition and Mining Applications. IPDPS, 2009. Google ScholarDigital Library
- J. Meng, A. Raghunathan, S. Chakradhar, and S. Byna. Exploiting the Forgiving Nature of Applications for Scalable Parallel Execution. IPDPS, 2010.Google Scholar
- S. Misailovic, D. Kim, and M. Rinard. Automatic parallelization with statistical accuracy bounds. Technical Report MIT-CSAIL-TR-2010-007, MIT, 2010.Google Scholar
- S. Misailovic, D. Kim, and M. Rinard. Parallelizing sequential programs with statistical accuracy tests. Technical Report MIT-CSAIL-TR-2010-038, MIT, 2010.Google Scholar
- S. Misailovic, D. Kim, and M. Rinard. Parallelizing sequential programs with statistical accuracy tests. ACM Transactions on Embedded Computing, Special Issue on Probabilistic Embedded Computing (to appear), 2013.Google Scholar
- S. Misailovic, D. Roy, and M. Rinard. Probabilistically Accurate Program Transformations. SAS, 2011. Google ScholarDigital Library
- S. Misailovic, S. Sidiroglou, H. Hoffmann, and M. Rinard. Quality of service profiling. ICSE, 2010. Google ScholarDigital Library
- L. Renganarayana, V. Srinivasan, R. Nair, D. Prener, and C. Blundell. Relaxing synchronization for performance and insight. Technical Report RC25256, IBM, 2011.Google Scholar
- M. Rinard. Probabilistic accuracy bounds for fault-tolerant computations that discard tasks. ICS, 2006. Google ScholarDigital Library
- M. Rinard. Using early phase termination to eliminate load imbalances at barrier synchronization points. OOPSLA, 2007. Google ScholarDigital Library
- M. Rinard. A lossy, synchronization-free, race-full, but still acceptably accurate parallel space-subdivision tree construction algorithm. Technical Report MIT-CSAIL-TR-2012-005, MIT, 2012.Google Scholar
- S. Rul, H. Vandierendonck, and K. De Bosschere. A dynamic analysis tool for finding coarse-grain parallelism. HiPEAC Industrial Workshop, 2008.Google Scholar
- A. Sampson, W. Dietl, E. Fortuna, D. Gnanapragasam, L. Ceze, and D. Grossman. Enerj: Approximate data types for safe and general low-power computation. PLDI, 2011. Google ScholarDigital Library
- S. Sidiroglou, S. Misailovic, H. Hoffmann, and M. Rinard. Managing Performance vs. Accuracy Trade-offs With Loop Perforation. FSE '11.Google Scholar
- J. Sorber, A. Kostadinov, M. Garber, M. Brennan, M. D. Corner, and E. D. Berger. Eon: a language and runtime system for perpetual systems. SenSys, 2007. Google ScholarDigital Library
- G. Tournavitis, Z. Wang, B. Franke, and M. O'Boyle. Towards a holistic approach to auto-parallelization: integrating profile-driven parallelism detection and machine-learning based mapping. PLDI, 2009. Google ScholarDigital Library
- A. Udupa, K. Rajan, and W. Thies. Alter: Leveraging breakable dependences for parallelization. PLDI, 2011. Google ScholarDigital Library
- D. Ungar, D. Kimelman, and S. Adams. Inconsistency robustness for scalability in interactive concurrent-update in-memory MOLAP cubes. Technical report, IBM TJ Watson, 2011.Google Scholar
- A. Wald. Sequential analysis. John Wiley and Sons, 1947.Google Scholar
- Z. Zhu, S. Misailovic, J. Kelner, and M. Rinard. Randomized accuracy-aware program transformations for efficient approximate computations. POPL, 2012. Google ScholarDigital Library
Index Terms
- Dancing with uncertainty
Recommendations
Parallelizing Sequential Programs with Statistical Accuracy Tests
Special Section on Probabilistic Embedded ComputingWe present QuickStep, a novel system for parallelizing sequential programs. Unlike standard parallelizing compilers (which are designed to preserve the semantics of the original sequential computation), QuickStep is instead designed to generate (...
Dual-level parallelism for ab initio molecular dynamics: Reaching teraflop performance with the CPMD code
We show teraflop performance of the fully featured ab initio molecular dynamics code CPMD on an IBM pSeries 690 cluster. A mixed distributed-memory, coarse-grained parallel approach using the MPI library and shared-memory, fine-grained parallelism using ...
Parallel Quicksort Using Fetch-And-Add
A parallelization of the Quicksort algorithm that is suitable for execution on a shared memory multiprocessor with an efficient implementation of the fetch-and-add operation is presented. The partitioning phase of Quicksort, which has been considered a ...
Comments