Skip to main content

A Study of Performance Scalability by Parallelizing Loop Iterations on Multi-core SMPs

  • Conference paper
Algorithms and Architectures for Parallel Processing (ICA3PP 2010)

Abstract

Today, the challenge is to exploit the parallelism available in the way of multi-core architectures by the software. This could be done by re-writing the application, by exploiting the hardware capabilities or expect the compiler/software runtime tools to do the job for us. With the advent of multi-core architectures ([1] [2]), this problem is becoming more and more relevant. Even today, there are not many run-time tools to analyze the behavioral pattern of such performance critical applications, and to re-compile them. So, techniques like OpenMP for shared memory programs are still useful in exploiting parallelism in the machine. This work tries to study if the loop parallelization (both with and without applying transformations) can be a good case for running scientific programs efficiently on such multi-core architectures. We have found the results to be encouraging and we strongly feel that this could lead to some good results if implemented fully in a production compiler for multi-core architectures.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. AMD Multi-core Products (2006), http://multicore.amd.com/en/products/

  2. Multi-core from Intel Products and Platforms (2006), http://www.intel.com/products/processor/

  3. OpenMP, http://www.openmp.org

  4. Wolfe, M.J.: Techniques for improving the inherent parallelism in programs. Technical Report 78-929, Department of Computer Science, University of Illinois at Urbana-Champaign (July 1990)

    Google Scholar 

  5. Wolfe, M.: High Performance Compilers for Parallel Computing. Addison-Wesley, Reading

    Google Scholar 

  6. Banerjee, U.K.: Loop Transformations for Restructuring Compilers: The Foundations. Kluwer Academic Publishers, Norwell (1993)

    MATH  Google Scholar 

  7. Banerjee, U.K.: Loop Parallelization. Kluwer Academic Publishers, Norwell (1994)

    MATH  Google Scholar 

  8. Pthreads reference, https://computing.llnl.gov/tutorials/pthreads/

  9. DHollander, E.H.: Partitioning and Labelling of loops by Unimodular Transformation. IEEE Transactions on Parallel and Distributed Systems 3(4) (1992)

    Google Scholar 

  10. Saas, R., Mutka, M.: Enabling unimodular transformation. In: Supercomputing 1994, November 1994, pp. 753–762 (1994)

    Google Scholar 

  11. Banerjee, U.: Unimodular Transformations of Double Loop. In: Advances in Languages and Compilers for Parallel Processing, pp. 192–219 (1991)

    Google Scholar 

  12. Prakash, S.R., Srikant, Y.N.: An Approach to Global Data Partitioning for Distributed Memory Machines. In: IPPS/SPDP (1999)

    Google Scholar 

  13. Prakash, S.R., Srikant, Y.N.: Communication Cost Estimation and Global Data Partitioning for Distributed Memory Machines. In: Fourth International Conference on High Performance Computing, Bangalore (1997)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Raghavendra, P. et al. (2010). A Study of Performance Scalability by Parallelizing Loop Iterations on Multi-core SMPs. In: Hsu, CH., Yang, L.T., Park, J.H., Yeo, SS. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2010. Lecture Notes in Computer Science, vol 6081. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13119-6_41

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-13119-6_41

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-13118-9

  • Online ISBN: 978-3-642-13119-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics