A Data Stream Clustering Algorithm Based on Density and Extended Grid

Hua, Zheng; Du, Tao; Qu, Shouning; Mou, Guodong

doi:10.1007/978-3-319-63312-1_61

Zheng Hua^16,17,
Tao Du^16,17,
Shouning Qu^16,17 &
…
Guodong Mou^16,17

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10362))

Included in the following conference series:

International Conference on Intelligent Computing

2138 Accesses
1 Citations

Abstract

Based on the traditional grid density clustering algorithm, proposing A Data Stream Clustering Algorithm Based on Density and Extended Grid(DEGDS). The algorithm combines the advantages of grid clustering algorithm and density clustering algorithm, by improving the defects of clustering parameters by artificially set, get any shape of the cluster. The algorithm uses the local density of each sample point and the distance from the other sample points, determining the number of clustering centers in the grid, and realizing the automatic determination of the clustering center, which avoids the influence of improper selection of initial centroid on clustering results. And in the process of combining the Spark parallel framework for partitioning the data to achieve its parallelization. For data points clustered outside the grid, the clustering within the grid has been effectively expanded by extending the grid, to ensure the accuracy of clustering. Introduced density estimation is connected and grid boundaries to merging grid, saving memory consumption. Using the attenuation factor to incremental update grid density, reflect the evolution of spatial data stream. The experimental results show that compared with the traditional clustering algorithm, the DEGDS algorithm has a large performance improvement in accuracy and efficiency, and can be effectively for large data clustering.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

J. Comput. Appl. 36(12), 3292–3297 (2016)
Google Scholar
Fiori, A., Mignone, A., Rospo, G.: DeCoClu: density consensus clustering approach for public transport data. Inf. Sci. 328, 378–388 (2016)
Article Google Scholar
Tang, Y.: A distributed data flow clustering algorithm based on grid block. Small Microcomput. Syst. 37(3), 488–493 (2016)
Google Scholar
Gao, Y.: A data flow clustering algorithm based on grid and density. Comput. Sci. 35(2), 134–137 (2008)
Google Scholar
Ma, C., Hong, S.: A dense peak clustering algorithm based on cluster center point automatic selection strategy. Comput. Sci. 43(7), 255–258 (2016)
Google Scholar
Jiang, L.: Optimization of fast clustering algorithm for fast search and discovery density. Appl. Res. Comput. 33(11), 3251–3254 (2016)
Google Scholar
Zheng, Y.: Data flow clustering algorithm based on mobile grid and density. Comput. Eng. Appl. 45(8), 129–131 (2009)
Google Scholar
Feng, C.: Data Flow Clustering Analysis Algorithm. Fudan University (2006)
Google Scholar
Chen, J.Y., He, H.H.: A fast density-based data stream clustering algorithm with cluster centers self-determined for mixed data. Inf. Sci. 345(C), 271–293 (2016)
Article Google Scholar
Skála, J., Kolingerová, I.: Dynamic hierarchical triangulation of a clustered data stream. Comput. Geosci. 37(8), 1092–1101 (2011)
Article Google Scholar
Samwel, B., Whipkey, C.: Efficient top-down hierarchical join on a hierarchically clustered data stream (2016)
Google Scholar
Krawczyk, B., Stefanowski, J., Wozniak, M.: Data stream classification and big data analytics. Neurocomputing 150, 238–239 (2015)
Article Google Scholar
Nguyen, H.L., Woon, Y.K., Ng, W.K.: A survey on data stream clustering and classification. Knowl. Inf. Syst. 45(3), 1–35 (2015)
Article Google Scholar
Xu, S., Wang, J.: Dynamic extreme learning machine for data stream classification. Neurocomputing 238, 433–449 (2017)
Article Google Scholar
Xiaoyun, C., Yufang, M., Yan, Z., et al.: GMDBSCAN: multi-density DBSCAN cluster based on grid. In: IEEE International Conference on E-Business Engineering, pp. 780–783. IEEE (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Information Science and Engineering, University of Jinan, No. 336, West Road of Nan Xinzhuang, Jinan, 250022, Shandong, China
Zheng Hua, Tao Du, Shouning Qu & Guodong Mou
Shandong Provincial Key Laboratory of Network Based Intelligent Computing, University of Jinan, Jinan, 250022, China
Zheng Hua, Tao Du, Shouning Qu & Guodong Mou

Authors

Zheng Hua
View author publications
You can also search for this author in PubMed Google Scholar
Tao Du
View author publications
You can also search for this author in PubMed Google Scholar
Shouning Qu
View author publications
You can also search for this author in PubMed Google Scholar
Guodong Mou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tao Du .

Editor information

Editors and Affiliations

Tongji University, Shanghai, China
De-Shuang Huang
University of Ulsan, Ulsan, Korea (Republic of)
Kang-Hyun Jo
Universidad Distrital Francisco José de Caldas, Bogotá, Colombia
Juan Carlos Figueroa-García

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hua, Z., Du, T., Qu, S., Mou, G. (2017). A Data Stream Clustering Algorithm Based on Density and Extended Grid. In: Huang, DS., Jo, KH., Figueroa-García, J. (eds) Intelligent Computing Theories and Application. ICIC 2017. Lecture Notes in Computer Science(), vol 10362. Springer, Cham. https://doi.org/10.1007/978-3-319-63312-1_61

Download citation

DOI: https://doi.org/10.1007/978-3-319-63312-1_61
Published: 20 July 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-63311-4
Online ISBN: 978-3-319-63312-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics