ABSTRACT
The Threaded many-core memory (TMM) model provides a framework to analyze the performance of algorithms on GPUs. Here, we investigate the effectiveness of the TMM model by analyzing algorithms for 3 classic problems -- suffix tree/array for string matching, fast Fourier transform, and merge sort -- under this model. Our findings indicate that the TMM model can explain and predict previously unexplained trends and artifacts in experimental data.
- G. Encarnaijao, N. Sebastiao, and N. Roma. Advantages and GPU implementation of high-performance indexed DNA search based on suffix arrays. In Proc. of HPCS, 2011.Google Scholar
- N. K. Govindaraju et al. High performance discrete Fourier transforms on graphics processors. In Proc. of SC, 2008. Google ScholarDigital Library
- L. Ma, K. Agrawal, and R. D. Chamberlain. A memory access model for highly-threaded many-core architectures. Future Generation Computer Systems, 30: 202--215, January 2014. Google ScholarDigital Library
- N. Satish et al. Designing efficient sorting algorithms for manycore GPUs. In Proc. of IPDPS, 2009. Google ScholarDigital Library
Index Terms
- Theoretical analysis of classic algorithms on highly-threaded many-core GPUs
Recommendations
Theoretical analysis of classic algorithms on highly-threaded many-core GPUs
PPoPP '14The Threaded many-core memory (TMM) model provides a framework to analyze the performance of algorithms on GPUs. Here, we investigate the effectiveness of the TMM model by analyzing algorithms for 3 classic problems -- suffix tree/array for string ...
Mapping of option pricing algorithms onto heterogeneous many-core architectures
The rapid development of technologies and applications in recent years poses high demands and challenges for high-performance computing. Because of their competitive performance/price ratio, heterogeneous many-core architectures are widely used in high-...
Vectorizing Unstructured Mesh Computations for Many-core Architectures
PMAM'14: Proceedings of Programming Models and Applications on Multicores and ManycoresAchieving optimal performance on the latest multi-core and many-core architectures depends more and more on making efficient use of the hardware's vector processing capabilities. While auto-vectorizing compilers do not require the use of vector ...
Comments