Parallel Canopy Clustering on GPUs

Kozawa, Yusuke; Hayashi, Fumitaka; Amagasa, Toshiyuki; Kitagawa, Hiroyuki

doi:10.1007/978-3-319-22849-5_23

Yusuke Kozawa¹⁸,
Fumitaka Hayashi¹⁸,
Toshiyuki Amagasa¹⁹ &
…
Hiroyuki Kitagawa¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9261))

Included in the following conference series:

1271 Accesses
2 Citations

Abstract

Canopy clustering is a preprocessing method for standard clustering algorithms such as k-means and hierarchical agglomerative clustering. Canopy clustering can greatly reduce the computational cost of clustering algorithms. However, canopy clustering itself may also take a vast amount of time for handling massive data, if we naïvely implement it. To address this problem, we present efficient algorithms and implementations of canopy clustering on GPUs, which have evolved recently as general-purpose many-core processors. We not only accelerate the computation of original canopy clustering, but also propose an algorithm using grid index. This algorithm partitions the data into cells to reduce redundant computations and, at the same time, to exploit the parallelism of GPUs. Experiments show that the proposed implementations on the GPU is 2 times faster on average than multi-threaded, SIMD implementations on two octa-core CPUs.

F. Hayashi—Currently working at International Laboratory Corporation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Bell, N., Garland, M.: Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: Proceedings of the SC, 18:1–18:11 (2009)
Google Scholar
Böhm, C., Noll, R., Plant, C., Wackersreuther, B.: Density-based clustering using graphics processors. In: Proceedings of the CIKM, pp. 661–670 (2009)
Google Scholar
Dash, M., Petrutiu, S., Scheuermann, P.: pPOP: Fast yet accurate parallel hierarchical clustering using partitioning. Data Knowl. Eng. 61(3), 563–578 (2007)
Article Google Scholar
Fan, Z.G., Wu, Y., Wu, B.: Maximum normalized spacing for efficient visual clustering. In: Proceedings of the CIKM, pp. 409–418 (2010)
Google Scholar
Harris, M.: Optimizing Parallel Reduction in CUDA. http://developer.download.nvidia.com/compute/cuda/2_2/sdk/website/projects/reduction/doc/reduction.pdf
Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques, 3rd edn. Morgan Kaufmann, Burlington (2011)
Google Scholar
He, B., Lu, M., Yang, K., Fang, R., Govindaraju, N.K., Luo, Q., Sander, P.V.: Relational Query Coprocessing on Graphics Processors. ACM Trans. Database Syst. 34(4), 21:1–21:39 (2009)
Article Google Scholar
Lomont, C.: Introduction to Intel\(\textregistered \) Advanced Vector Extensions. https://software.intel.com/en-us/articles/introduction-to-intel-advanced-vector-extensions
Kohlhoff, K.J., Pande, V.S., Altman, R.B.: K-Means for parallel architectures using All-Prefix-sum sorting and updating steps. IEEE Trans. Parallel Distrib. Syst. 24(8), 1602–1612 (2013)
Article MATH Google Scholar
Li, Y., Zhao, K., Chu, X., Liu, J.: Speeding up k-Means algorithm by GPUs. J. Comput. Syst. Sci. 79(2), 216–229 (2013)
Article MathSciNet Google Scholar
Li, Q., Wang, P., Wang, W., Hu, H., Li, Z., Li, J.: An efficient K-means clustering algorithm on MapReduce. In: Bhowmick, S.S., Dyreson, C.E., Jensen, C.S., Lee, M.L., Muliantara, A., Thalheim, B. (eds.) DASFAA 2014, Part I. LNCS, vol. 8421, pp. 357–371. Springer, Heidelberg (2014)
Chapter Google Scholar
McCallum, A., Nigam, K., Ungar, L.H.: Efficient clustering of High-dimensional data sets with application to reference matching. In: Proceedings of the KDD, pp. 169–178 (2000)
Google Scholar
NVIDIA: CUDA C Programming Guide. http://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/CUDA_C_Programming_Guide.pdf
Owens, J.D., Houston, M., Luebke, D., Green, S., Stone, J.E., Phillips, J.C.: Proc. IEEE GPU Comput. 96(5), 879–899 (2008)
Google Scholar
Patwary, M.A., Palsetia, D., Agrawal, A., Liao, W.k., Manne, F., Choudhary, A.: A new scalable parallel DBSCAN algorithm using the disjoint-set data structure. In: SC, pp. 62:1–62:11 (2012)
Google Scholar
Shalom, S.A.A., Dash, M.: Efficient partitioning based hierarchical agglomerative clustering using graphics accelerators with CUDA. Int. J. Artif. Intell. Appl. 4(2), 13–33 (2013)
Google Scholar
Soroush, E., Balazinska, M., Wang, D.: ArrayStore: a storage manager for complex parallel array processing. In: SIGMOD, pp. 253–264 (2011)
Google Scholar
Wasif, M., Narayanan, P.: Scalable clustering using multiple GPUs. In: HiPC, pp. 1–10 (2011)
Google Scholar
Welton, B., Samanas, E., Miller, B.P.: Mr. Scan: Extreme scale density-based clustering using a tree-based network of GPGPU nodes. In: SC, 84:1–84:11 (2013)
Google Scholar
Wu, H., Diamos, G., Cadambi, S., Yalamanchili, S.: Kernel weaver: automatically fusing database primitives for efficient GPU computation. In: MICRO, pp. 107–118 (2012)
Google Scholar

Download references

Acknowledgments

This research was partly supported by the Grant-in-Aid for Scientific Research (B) (#26280037) from Japan Society for the Promotion of Science.

Author information

Authors and Affiliations

Graduate School of Systems and Information Engineering, University of Tsukuba, Tsukuba, Japan
Yusuke Kozawa & Fumitaka Hayashi
Faculty of Engineering, Information and Systems, University of Tsukuba, Tsukuba, Japan
Toshiyuki Amagasa & Hiroyuki Kitagawa

Authors

Yusuke Kozawa
View author publications
You can also search for this author in PubMed Google Scholar
Fumitaka Hayashi
View author publications
You can also search for this author in PubMed Google Scholar
Toshiyuki Amagasa
View author publications
You can also search for this author in PubMed Google Scholar
Hiroyuki Kitagawa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yusuke Kozawa .

Editor information

Editors and Affiliations

Hewlett-Packard Enterprise, Sunnyvale, California, USA
Qiming Chen
Paul Sabatier University, Toulouse, France
Abdelkader Hameurlain
Blaise Pascal University, Aubiere, France
Farouk Toumani
University of Linz, Linz, Austria
Roland Wagner
Universidad Politécnica de Valencia, Valencia, Spain
Hendrik Decker

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kozawa, Y., Hayashi, F., Amagasa, T., Kitagawa, H. (2015). Parallel Canopy Clustering on GPUs. In: Chen, Q., Hameurlain, A., Toumani, F., Wagner, R., Decker, H. (eds) Database and Expert Systems Applications. Globe DEXA 2015 2015. Lecture Notes in Computer Science(), vol 9261. Springer, Cham. https://doi.org/10.1007/978-3-319-22849-5_23

Download citation

DOI: https://doi.org/10.1007/978-3-319-22849-5_23
Published: 11 August 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-22848-8
Online ISBN: 978-3-319-22849-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics