skip to main content
10.1145/1378533.1378574acmconferencesArticle/Chapter ViewAbstractPublication PagesspaaConference Proceedingsconference-collections
research-article

Cache-efficient dynamic programming algorithms for multicores

Published:14 June 2008Publication History

ABSTRACT

We present cache-efficient chip multiprocessor (CMP) algorithms with good speed-up for some widely used dynamic programming algorithms. We consider three types of caching systems for CMPs: D-CMP with a private cache for each core, S-CMP with a single cache shared by all cores, and Multicore, which has private L1 caches and a shared L2 cache. We derive results for three classes of problems: local dependency dynamic programming (LDDP), Gaussian Elimination Paradigm (GEP), and parenthesis problem.

For each class of problems, we develop a generic CMP algorithm with an associated tiling sequence. We then tailor this tiling sequence to each caching model and provide a parallel schedule that results in a cache-efficient parallel execution up to the critical path length of the underlying dynamic programming algorithm.

We present experimental results on an 8-core Opteron for two sequence alignment problems that are important examples of LDDP. Our experimental results show good speed-ups for simple versions of our algorithms.

References

  1. G. Blelloch, R. Chowdhury, P. Gibbons, V. Ramachandran, S. Chen, and M. Kozuch. Provably good multicore cache performance for divide-and-conquer algorithms. In Proc. ACM-SIAM SODA, pages 501--510, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. G. Blelloch and P. Gibbons. Effectively sharing a cache among threads. In Proc. ACM SPAA, pages 235--244, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. G. Blelloch, P. Gibbons, and Y. Matias. Provably efficient scheduling for languages with fine-grained parallelism. JACM, 46(2):281--321, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. Blumofe and C. Leiserson. Scheduling multithreaded computations by work stealing. JACM, 46(5):720--748, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. Cherng and R. Ladner. Cache efficient simple dynamic programming. In Proc. Intl Conf Analysis of Algorithms, pages 49--58, 2005.Google ScholarGoogle Scholar
  6. R. Chowdhury, H. Le, and V. Ramachandran. Efficient cache-oblivious string algorithms for Bioinformatics. Technical Report TR-07-03, Dept. of Computer Sciences, UT-Austin, 2007.Google ScholarGoogle Scholar
  7. R. Chowdhury and V. Ramachandran. Cache-oblivious dynamic programming. In Proc. ACM-SIAM SODA, pages 591--600, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. R. Chowdhury and V. Ramachandran. The cache-oblivious gaussian elimination paradigm: Theoretical framework, parallelization and experimental evaluation. In Proc. {ACM} SPAA, pages 71--80, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. R. Chowdhury and V. Ramachandran. Cache-efficient dynamic programming algorithms for multicores. Technical Report TR-08-16, Dept. of Computer Sciences, UT-Austin, 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. T. Cormen, C. Leiserson, R. Rivest, and C. Stein. Introduction to Algorithms. The MIT Press, second edition, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. Culler, R. Karp, D. Patterson, A. Sahay, K. Schauser, S. E., R. Subramonian, and T. von Eicken. Logp: Toward a realistic model of parallel computation. In Proc. 4th SIGPLAN Symp. Principles Practices of Parallel Programming, pages 1--12, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. T. DeSantis, I. Dubosarskiy, S. Murray, and G. Andersen. Comprehensive aligned sequence construction for automated design of effective probes (CASCADE-P) using 16S rDNA. Bioinformatics, 19:1461--1468, 2003. url: http://greengenes.llnl.gov/16S/.Google ScholarGoogle ScholarCross RefCross Ref
  13. M. Frigo, C. Leiserson, H. Prokop, and S. Ramachandran. Cache-oblivious algorithms. In Proc. IEEE FOCS, pages 285--297, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. Frigo and V. Strumpen. The cache complexity of multithreaded cache oblivious algorithms. In Proc ACM SPAA, pages 271--280, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Z. Galil and K. Park. Parallel algorithms for dynamic programming recurrences with more than o(1) dependency. JPDC, 21:213--222, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. P. Gibbons, Y. Matias, and V. Ramachandran. Can shared-memory model serve as a bridging model for parallel computation? In Proc. ACM SPAA, pages 72--83, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. L. Hennessy and D. A. Patterson. Computer Architecture: A Quantitative Approach. Morgan Kaufmann, third edition, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. D. Hirschberg. A linear space algorithm for computing maximal common subsequences. CACM, 18(6):341--343, 1975. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. R. Karp and V. Ramachandran. Parallel algorithms for shared memory machines. In Handbook of Theor Comp Sci, pages 869--941. Elsevier, 1990.Google ScholarGoogle Scholar
  20. B. Knudsen. Multiple parsimony alignment with "affalign". Software package multalign.tar.Google ScholarGoogle Scholar
  21. B. Knudsen. Optimal multiple parsimony alignment with affine gap cost using a phylogenetic tree. In Proc. Workshop Algs in Bioinf. , pages 433--446, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  22. W. Pearson and D. Lipman. Improved tools for biological sequence comparison. In Proc. Natl Acad. Sciences, volume 85, pages 2444--2448, 1988.Google ScholarGoogle ScholarCross RefCross Ref
  23. D. Powell. Software package align3str_checkp.tar.gz.Google ScholarGoogle Scholar
  24. D. Powell, L. Allison, and T. Dix. Fast, optimal alignment of three sequences using linear gap cost. Journal of Theoretical Biology, 207(3):325--336, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  25. G. Tan, N. Sun, and G. R. Gao. A parallel dynamic programming algorithm on a multi-core architecture. In ACM SPAA, pages 135--144, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. Thomas et al. Comparative analyses of multi-species sequences from targeted genomic regions. Nature, 424:788--793, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  27. L. Valiant. General context-free recognition in less than cubic time. JCSS, 10:308--315, 1975.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. L. Valiant. A bridging model for parallel computation. CACM, 33(8):103--111, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Cache-efficient dynamic programming algorithms for multicores

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            SPAA '08: Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
            June 2008
            380 pages
            ISBN:9781595939739
            DOI:10.1145/1378533

            Copyright © 2008 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 14 June 2008

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            Overall Acceptance Rate447of1,461submissions,31%

            Upcoming Conference

            SPAA '24

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader