Abstract
State-of-the-art run-time systems are a poor match to diverse, dynamic distributed applications because they are designed to provide support to a wide variety of applications, without much customization to individual specific requirements. Little or no guiding information flows directly from the application to the run-time system to allow the latter to fully tailor its services to the application. As a result, the performance is disappointing. To address this problem, we propose application-centric computing, or SMART APPLICATIONS. In the executable of smart applications, the compiler embeds most run-time system services, and a performance-optimizing feedback loop that monitors the application’s performance and adaptively reconfigures the application and the OS/hardware platform. At run-time, after incorporating the code’s input and the system’s resources and state, the SmartApp performs a global optimization. This optimization is instance specific and thus much more tractable than a global generic optimization between application, OS and hardware. The resulting code and resource customization should lead to major speedups. In this paper, we first describe the overall architecture of Smartapps and then present the achievements to date: Run-time optimizations, performance modeling, and moderately reconfigurable hardware.
Research supported in part by NSF CAREER Award CCR-9734471, NSF CAREER Award CCR-9624315, NSF Grant ACI-9872126, NSF-NGS EIA-9975018, DOE ASCI ASAP Level 2 Grant B347886, and Hewlett-Packard Equipment Grants
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
S. Adve, V. Adve, M. Hill, and M. Vernon. Comparison of Hardware and Software Cache Coherence Schemes. In Proc. of the 18th ISCA, pp. 298–308, June 1991.
N.M. Amato, J. Perdue, A. Pietracaprina, G. Pucci, and M. Mathis. Predicting performance on smps. a case study: The SGI Power Challenge. In Proc. IPDPS, pp. 729–737, May 2000.
M. Auslander, H. Franke, B. Gamsa, O. Krieger, and M. Stumm. Customization lite. In Proc. of 6th Workshop on Hot Topics in Operating Systems (HotOS-VI), May 1997.
J. Appavo B. Gamsa, O. Krieger and M. Stumm. Tornado: Maximizing locality and concurrency in a shared memory multiprocessor operating system. In Proc. of OSDI, 1999.
B. Grant, et. al. An evaluation of staged run-time optimizations in Dyce. In Proc. of the SIGPLAN 1999 PLDI, Atlanta, GA, May 1999.
G.E. Blelloch, P.B. Gibbons, Y. Mattias, and M. Zagha. Accounting for memory bank contention and delay in high-bandwidth multiprocessors. IEEE Trans. Par.Dist. Sys., 8(9):943–958, 1997.
J. Mark Bull. Feedback guided dynamic loop scheduling: Algorithms and experiments. In EUROPAR98, Sept., 1998.
F. Dang and L. Rauchwerger. Speculative parallelization of partially parallel loops. In Proc. of the 5th Int. Workshop LCR 2000, Lecture Notes in Computer Science, May 2000.
D. Engler. Vcode: a portable, very fast dynamic code generation system. In Proc. of the SIGPLAN 1996 PLDI Philadelphia, PA, May 1996.
D. Bailey et al. The NAS parallel benchmarks. Int. J. Supercomputer Appl., 5(3):63–73, 1991.
P. B. Gibbons, Y. Matias, and V. Ramachandran. Can a shared-memory model serve as a bridging-model for parallel computation? In Proc. ACM SPAA, pp. 72–83, 1997.
H. Han and C.-W. Tseng. Improving compiler and run-time support for adaptive irregular codes. In PACT’98, Oct. 1998.
R. Iyer, N. Amato, L. Rauchwerger, and L. Bhuyan. Comparing the memory system performance of the HP V=AD-Class and SGI Origin 2000 multiprocessors using microbenchmarks and scientific applications. In Proc. of ACM ICS, pp. 339–347, June 1999.
B. H. H. Juurlink and H. A. G.Wijshoff. A quantitative comparison of parallel computation models. In Proc. of ACM SPAA, pp. 13–24, 1996.
D. Keppel, S. J. Eggers, and R. R. Henry. A case for runtime code generation. TR UWCSE 91-11-04, Dept. of Computer Science and Engineering, Univ. of Washington, Nov. 1991..
O. Krieger and M. Stumm. Hfs: A performance-oriented flexible file system based on building-block compositions. IEEE Trans. Comput., 15(3):286–321, 1997.
S. Owicki and A. Agarwal. Evaluating the performance of software cache coherency. In Proc. of ASPLOS III, April 1989.
L. Rauchwerger, N. Amato, and D. Padua. A Scalable Method for Run-time Loop Parallelization. Int. J. Paral. Prog., 26(6):537–576, July 1995.
L. Rauchwerger and D. Padua. The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization. IEEE Trans. on Par. and Dist. Systems, 10(2), 1999.
L. Rauchwerger and D. Padua. Parallelizing WHILE Loops for Multiprocessor Systems. In Proc. of 9th IPPS, April 1995.
Silicon Graphics Corporation 1995. SGI Power Challenge: User’s Guide, 1995.
R. Simoni and M. Horowitz. Modeling the Performance of Limited Pointer Directories for Cache Coherence. In Proc. of the 18th ISCA, pp. 309–318, June 1991.
J. T orrellas, J. Hennessy, and T. Weil. Analysis of Critical Architectural and Programming Parameters in a Hierarchical Shared Memory Multiprocessor. In ACM Sigmetrics Conf. on Measurement and Modeling of Computer Systems, pp. 163–172, May 1990.
H. Y u and L. Rauchwerger. Adaptive reduction parallelization. In Proc. of the 14th ACM ICS, Santa Fe, NM, May 2000.
Y. Zhang, L. Rauchwerger, and J. Torrellas. Hardware for Speculative Run-Time Parallelization in Distributed Shared-Memory Multiprocessors. In Proc. of HPCA-4, pp. 162–173, 1998.
Y. Zhang, L. Rauchwerger, and J. Torrellas. Speculative Parallel Execution of Loops with Cross-Iteration Dependences in DSM Multiprocessors. In Proc. of HPCA-5, Jan. 1999.
Ye Zhang. DSM Hardware for Speculative Parallelization. Ph.D. Thesis, Department of ECE, Univ. of Illinois, Urbana, IL, Jan. 1999
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Rauchwerger, L., Amato, N.M., Torrellas, J. (2001). SmartApps: An Application Centric Approach to High Performance Computing. In: Midkiff, S.P., et al. Languages and Compilers for Parallel Computing. LCPC 2000. Lecture Notes in Computer Science, vol 2017. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45574-4_6
Download citation
DOI: https://doi.org/10.1007/3-540-45574-4_6
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42862-6
Online ISBN: 978-3-540-45574-5
eBook Packages: Springer Book Archive