Skip to main content
Log in

Hardware-Assisted Characterization of NAS Benchmarks

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

The UAH Logging, Trace Recording, and Analysis instrumentation (ULTRA) provides highly repeatable (0.0002% variation) application instruction counts for parallel programs which are invariant to the communication network used, the number of processors used, and the MPI communication library used. ULTRA, implemented as an MPI profiling wrapper, avoids the data collection system artifacts of time-based measurements by using instruction counts as the basic measure of work performed and records the operation performed and the amount of data sent for each network operation. These measurements can be scaled appropriately for various target architectures. ULTRA's instrumentation overhead is minimized by using the Pentium II processors's performance monitoring hardware, allowing large, production-run applications to be quickly characterized. Traces of the NAS benchmarks representing 6.67×1012 application instructions were generated by ULTRA. The application instructions executed per byte injected into the network and the instructions executed per message sent were computed from the traces. These values can be scaled by the expected processor performance to estimate the minimum network performance required to support the programs. It is impossible to use time-based measurements for this purpose due to measurement artifacts caused by the background processes and the communication network of the data collection system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. D. Bailey, T. Harris, W. Saphir, R Wijngaart, A. Woo and M. Yarrow, The NAS Parallel Benchmarks 2.0, NASA Ames Research Center, Moffett Field, CA, NAS-95–020 (December 1995).

    Google Scholar 

  2. M. Durbhakula, V.S. Pai and S. Adve, Improving the accuracy vs. speed tradeoff for simulating shared-memory multiprocessors with ILLP processors, Technical Report 9802, Dept. of Elec. and Comp. Engineering, Rice University (April 1998).

  3. F.T. Chong, R. Barua, F. Dahlgren, J.D. Kubiatowicz and A. Aragwal, The sensitivity of communication mechanisms to bandwidth and latency, in: Proc. of the 4th Int. Symp. on High Performance Computer Architecture, Las Vegas, NV (1–4 February 1998).

  4. W.E. Cohen and B.A. Mahafzah, Statistical analysis of message passing programs to guide computer design, in: Proc. of the Hawaii Int. Conf. on System Sciences 31, Kohala Coast, Hawaii, USA, ch. VII (6–9 January 1998) pp. 544–553.

  5. Intel Architecture Optimization Manual, Mt. Prospect, IL, 1997, Intel, http://www.intel.com/design/pentium/manuals/242816.htm, order no. 242816–003.

  6. J.R. Larus and E. Schnarr, Eel: Machine-independent executable editing, in: Proc. of the SIGPLAN '95 Conf. on Programming Language Design and Implementation (PLDI) (June 1995) pp. 291–300.

  7. R.P. Martin, A.M. Vahdat, D.E. Culler and T.E. Anderson, Effects of communication latency, overhead, and bandwidth in a cluster architecture, in: Proc. of the 24th Annual Int. Symp. on Computer Architecture, Denver, CO (2–4 June 1997) pp. 85–97.

  8. B.P. Miller, M.D. Callaghan, J.M. Cargille, J.K. Hollingsworth, R.B. Irvin, K.L. Karavanic, K. Kunchithapadam and T. Newhall, The paradyn parallel performance measurement tools, IEEE Computer 28(11) (November 1995).

  9. P.J. Mucci, S. Browne, C. Deane and G. Ho, PAPI: A portable interface to hardware performance counters, presented at the DoD High Performance Computing Modernization Program User Group Conference, in Monterey, CA, on 7–10 June 1999, http://icl.cs.utk.edu/ projects/papi/dodugm99/papi.html.

  10. S.S. Mukherjee, S.K. Reinhardt, B. Falsafi, M. Litzkow, S. Huss-Lederman, M.D. Hill, J.R. Larus and D.A. Wood, Wisconsin wind tunnel II: A fast portable parallel architecture simulator, in: Workshop on Performance Analysis and Its Impact on Design (PAID) (June 1997).

  11. G.J. Nutt, A.J. Griff, J.E. Mankovich and J.D. McWhirter, Extensible parallel program performance visualization, in: International Workshop on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS'95) (1995).

  12. D.A. Reed, R.A. Aydt, R.J. Noe, P.C. Roth, K.A. Shields, B. Schwartz and L.F. Tavera, Scalable performance analysis: The Pablo performance analysis environment, in: Proc. of the Scalable Parallel Libraries Conference, ed. A. Skjellum, IEEE Computer Society (October 1993) pp. 104–113.

  13. J.B. Rodgers, R.K. Gaede and J.H. Kulick, IN-tune: An in situ noninvasive performance tuning tool for multi-threaded linux on symmetric multiprocessing Pentium workstations, Software: Practice and Experience 29(9) (July 1999) 775–792.

    Google Scholar 

  14. A. Srivastava and A. Eustace, ATOM: A system for building customized program analysis tools, Research Report 94.2, DEC Western Research Laboratory (March 1994).

  15. D.J. Sorin, V.S. Pai, S.V. Adve, M.K. Vernon and D.A. Wood, Analytic evaluation of shared-memory systems with ILP processors, in: Proc. of the 25th Annual Int. Symp. on Computer Architecture, Barcelona, Spain (27 June-1 July 1998) pp. 380–391.

  16. E.H. Welbon, C.C. Chan-Nui, D.J. Shippy and D.A. Hickset, The POWER2 performance monitor, IBM Journal of Research and Development 38(5) (September 1994).

  17. S.C. Woo, M. Ohara, E. Torrie, J.P. Singh and A. Gupta, The SPLASH-2 programs: Characterization and methodological considerations, in: Proc. of the 22nd Int. Symp. on Computer Architecture, Santa Margherita Ligure, Italy (June 1995) pp. 24–36.

    Google Scholar 

  18. M. Zagha, B. Larson, S. Turner and M. Itzkowitz, Performance analysis using the MIPS R10000 performance counters, in: Proc. of Super Computing 1996 (November 1996).

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cohen, W., Gaede, R. & Garrett, W. Hardware-Assisted Characterization of NAS Benchmarks. Cluster Computing 4, 189–196 (2001). https://doi.org/10.1023/A:1011442306605

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1011442306605

Navigation