skip to main content
10.1145/2304576.2304621acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article

GPU merge path: a GPU merging algorithm

Published:25 June 2012Publication History

ABSTRACT

Graphics Processing Units (GPUs) have become ideal candidates for the development of fine-grain parallel algorithms as the number of processing elements per GPU increases. In addition to the increase in cores per system, new memory hierarchies and increased bandwidth have been developed that allow for significant performance improvement when computation is performed using certain types of memory access patterns.

Merging two sorted arrays is a useful primitive and is a basic building block for numerous applications such as joining database queries, merging adjacency lists in graphs, and set intersection. An efficient parallel merging algorithm partitions the sorted input arrays into sets of non-overlapping sub-arrays that can be independently merged on multiple cores. For optimal performance, the partitioning should be done in parallel and should divide the input arrays such that each core receives an equal size of data to merge.

In this paper, we present an algorithm that partitions the workload equally amongst the GPU Streaming Multi-processors (SM). Following this, we show how each SM performs a parallel merge and how to divide the work so that all the GPU's Streaming Processors (SP) are utilized. All stages in this algorithm are parallel. The new algorithm demonstrates good utilization of the GPU memory hierarchy. This approach demonstrates an average of 20X and 50X speedup over a sequential merge on the x86 platform for integer and floating point, respectively. Our implementation is 10X faster than the fast parallel merge supplied in the CUDA Thrust library.

References

  1. S. Chen, J. Qin, Y. Xie, J. Zhao, and P. Heng. An efficient sorting algorithm with cuda. Journal of the Chinese Institute of Engineers, 32(7):915--921, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  2. T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algorithms. The MIT Press, New York, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. N. Deo, A. Jain, and M. Medidi. An optimal parallel algorithm for merging using multiselection. Information Processing Letters, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. N. K. Govindaraju, N. Raghuvanshi, M. Henson, D. Tuft, and D. Manocha. A cache-efficient sorting algorithm for database and data mining computations using graphics processors. Technical report, 2005.Google ScholarGoogle Scholar
  5. J. Hoberock and N. Bell. Thrust: A parallel template library, 2010. Version 1.3.0.Google ScholarGoogle Scholar
  6. NVIDIA Corporation. Nvidia cuda programming guide. 2011.Google ScholarGoogle Scholar
  7. S. Odeh, O. Green, Z. Mwassi, O. Shmueli, and Y. Birk. Merge path - cache-efficient parallel merge and sort. Technical report, CCIT Report No. 802, EE Pub. No. 1759, Electrical Engr. Dept., Technion, Israel, Jan. 2012.Google ScholarGoogle Scholar
  8. S. Odeh, O. Green, Z. Mwassi, O. Shmueli, and Y. Birk. Merge path - parallel merging made simple. In Parallel and Distributed Processing Symposium, International, May 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. N. Satish, M. Harris, and M. Garland. Designing efficient sorting algorithms for manycore gpus. Parallel and Distributed Processing Symposium, International, 0:1--10, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Y. Shiloach and U. Vishkin. Finding the maximum, merging, and sorting in a parallel computation model. Journal of Algorithms, 2:88--102, 1981.Google ScholarGoogle ScholarCross RefCross Ref
  11. E. Sintorn and U. Assarsson. Fast parallel gpu-sorting using a hybrid algorithm. Journal of Parallel and Distributed Computing, 68(10):1381--1388, 2008. General-Purpose Processing using Graphics Processing Units. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. GPU merge path: a GPU merging algorithm

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        ICS '12: Proceedings of the 26th ACM international conference on Supercomputing
        June 2012
        400 pages
        ISBN:9781450313162
        DOI:10.1145/2304576

        Copyright © 2012 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 25 June 2012

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate584of2,055submissions,28%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader