Skip to main content
Log in

A study of large-scale data clustering based on fuzzy clustering

  • Methodologies and Application
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Large-scale data are any data that cannot be loaded into the main memory of the ordinary. This is not the objective definition of large-scale data, but it is easy to understand what the large-scale data is. We first introduce some present algorithms to clustering large-scale data, some data stream clustering algorithms based on FCM algorithms are also introduced. In this paper, we propose a new structure to cluster large-scale data and two new data stream clustering algorithms based on the structure are propose in Sects. 3 and 4. In our method, we load the objects in the dataset one by one. We set a threshold of the membership, if the membership of one object and a cluster center is bigger than the threshold, the object is assigned to the cluster and the location of nearest cluster center will be updated, else the object is put into the temporary matrix; we call it pool. When the pool is full, we cluster the data in the pool and update the location of cluster centers. The two algorithms are based on the data stream structure. The difference of the two algorithms is the how the objects in the data are weighed. We test our algorithms on handwritten digits images dataset and several large-scale UCI datasets and make a comparison with some presented algorithms. The experiments proved that our algorithm is more suitable to cluster large-scale datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

Download references

Acknowledgments

This work was supported by the Program for New Century Excellent Talents in University (No. NCET-12-0920), the Program for New Scientific and Technological Star of Shaanxi Province (No. 2014KJXX-45), the National Natural Science Foundation of China (Nos. 61272279, 61272282, 61371201, and 61203303), the Fundamental Research Funds for the Central Universities (Nos. K5051302049, K5051302023, K50511020011, K5051302002 and K5051302028), the Provincial Natural Science Foundation of Shaanxi of China (No. 2011JQ8020), the Fund for Foreign Scholars in University Research and Teaching Programs (the 111 Project) (No. B07048) and EU IRSES project (No. 247619).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yangyang Li.

Additional information

Communicated by V. Loia.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, Y., Yang, G., He, H. et al. A study of large-scale data clustering based on fuzzy clustering. Soft Comput 20, 3231–3242 (2016). https://doi.org/10.1007/s00500-015-1698-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-015-1698-1

Keywords

Navigation