Skip to main content

Parallel Canopy Clustering on GPUs

  • Conference paper
  • First Online:
Database and Expert Systems Applications (Globe 2015, DEXA 2015)

Abstract

Canopy clustering is a preprocessing method for standard clustering algorithms such as k-means and hierarchical agglomerative clustering. Canopy clustering can greatly reduce the computational cost of clustering algorithms. However, canopy clustering itself may also take a vast amount of time for handling massive data, if we naïvely implement it. To address this problem, we present efficient algorithms and implementations of canopy clustering on GPUs, which have evolved recently as general-purpose many-core processors. We not only accelerate the computation of original canopy clustering, but also propose an algorithm using grid index. This algorithm partitions the data into cells to reduce redundant computations and, at the same time, to exploit the parallelism of GPUs. Experiments show that the proposed implementations on the GPU is 2 times faster on average than multi-threaded, SIMD implementations on two octa-core CPUs.

F. Hayashi—Currently working at International Laboratory Corporation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://nvlabs.github.io/moderngpu/.

  2. 2.

    https://code.google.com/p/thrust/.

References

  1. Bell, N., Garland, M.: Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: Proceedings of the SC, 18:1–18:11 (2009)

    Google Scholar 

  2. Böhm, C., Noll, R., Plant, C., Wackersreuther, B.: Density-based clustering using graphics processors. In: Proceedings of the CIKM, pp. 661–670 (2009)

    Google Scholar 

  3. Dash, M., Petrutiu, S., Scheuermann, P.: pPOP: Fast yet accurate parallel hierarchical clustering using partitioning. Data Knowl. Eng. 61(3), 563–578 (2007)

    Article  Google Scholar 

  4. Fan, Z.G., Wu, Y., Wu, B.: Maximum normalized spacing for efficient visual clustering. In: Proceedings of the CIKM, pp. 409–418 (2010)

    Google Scholar 

  5. Harris, M.: Optimizing Parallel Reduction in CUDA. http://developer.download.nvidia.com/compute/cuda/2_2/sdk/website/projects/reduction/doc/reduction.pdf

  6. Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques, 3rd edn. Morgan Kaufmann, Burlington (2011)

    Google Scholar 

  7. He, B., Lu, M., Yang, K., Fang, R., Govindaraju, N.K., Luo, Q., Sander, P.V.: Relational Query Coprocessing on Graphics Processors. ACM Trans. Database Syst. 34(4), 21:1–21:39 (2009)

    Article  Google Scholar 

  8. Lomont, C.: Introduction to Intel\(\textregistered \) Advanced Vector Extensions. https://software.intel.com/en-us/articles/introduction-to-intel-advanced-vector-extensions

  9. Kohlhoff, K.J., Pande, V.S., Altman, R.B.: K-Means for parallel architectures using All-Prefix-sum sorting and updating steps. IEEE Trans. Parallel Distrib. Syst. 24(8), 1602–1612 (2013)

    Article  MATH  Google Scholar 

  10. Li, Y., Zhao, K., Chu, X., Liu, J.: Speeding up k-Means algorithm by GPUs. J. Comput. Syst. Sci. 79(2), 216–229 (2013)

    Article  MathSciNet  Google Scholar 

  11. Li, Q., Wang, P., Wang, W., Hu, H., Li, Z., Li, J.: An efficient K-means clustering algorithm on MapReduce. In: Bhowmick, S.S., Dyreson, C.E., Jensen, C.S., Lee, M.L., Muliantara, A., Thalheim, B. (eds.) DASFAA 2014, Part I. LNCS, vol. 8421, pp. 357–371. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  12. McCallum, A., Nigam, K., Ungar, L.H.: Efficient clustering of High-dimensional data sets with application to reference matching. In: Proceedings of the KDD, pp. 169–178 (2000)

    Google Scholar 

  13. NVIDIA: CUDA C Programming Guide. http://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/CUDA_C_Programming_Guide.pdf

  14. Owens, J.D., Houston, M., Luebke, D., Green, S., Stone, J.E., Phillips, J.C.: Proc. IEEE GPU Comput. 96(5), 879–899 (2008)

    Google Scholar 

  15. Patwary, M.A., Palsetia, D., Agrawal, A., Liao, W.k., Manne, F., Choudhary, A.: A new scalable parallel DBSCAN algorithm using the disjoint-set data structure. In: SC, pp. 62:1–62:11 (2012)

    Google Scholar 

  16. Shalom, S.A.A., Dash, M.: Efficient partitioning based hierarchical agglomerative clustering using graphics accelerators with CUDA. Int. J. Artif. Intell. Appl. 4(2), 13–33 (2013)

    Google Scholar 

  17. Soroush, E., Balazinska, M., Wang, D.: ArrayStore: a storage manager for complex parallel array processing. In: SIGMOD, pp. 253–264 (2011)

    Google Scholar 

  18. Wasif, M., Narayanan, P.: Scalable clustering using multiple GPUs. In: HiPC, pp. 1–10 (2011)

    Google Scholar 

  19. Welton, B., Samanas, E., Miller, B.P.: Mr. Scan: Extreme scale density-based clustering using a tree-based network of GPGPU nodes. In: SC, 84:1–84:11 (2013)

    Google Scholar 

  20. Wu, H., Diamos, G., Cadambi, S., Yalamanchili, S.: Kernel weaver: automatically fusing database primitives for efficient GPU computation. In: MICRO, pp. 107–118 (2012)

    Google Scholar 

Download references

Acknowledgments

This research was partly supported by the Grant-in-Aid for Scientific Research (B) (#26280037) from Japan Society for the Promotion of Science.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yusuke Kozawa .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Kozawa, Y., Hayashi, F., Amagasa, T., Kitagawa, H. (2015). Parallel Canopy Clustering on GPUs. In: Chen, Q., Hameurlain, A., Toumani, F., Wagner, R., Decker, H. (eds) Database and Expert Systems Applications. Globe DEXA 2015 2015. Lecture Notes in Computer Science(), vol 9261. Springer, Cham. https://doi.org/10.1007/978-3-319-22849-5_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-22849-5_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-22848-8

  • Online ISBN: 978-3-319-22849-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics