Abstract
Machine learning in Big Data is getting the spotlight to retrieve useful knowledge inherent in multi-dimensional information and discover new inherent knowledge in the fields related to the storage and retrieval of massive multi-dimensional information that is newly produced. The machine learning technique can be divided into supervised and unsupervised learning according to whether there is data labeling or not. Unsupervised learning, which is a technique to classify and analyze data with no labeling, is utilized in various ways in the analysis of multi-dimensional Big Data. The present study thus proposed an altered K-means algorithm to analyze the problems with the old one and determine the number of clusters automatically. The study also proposed an approach of optimizing the number of clusters through principal component analysis, a pre-processing process, with the input data for clustering. The performance evaluation results confirm that the CVI of the proposed algorithm was superior to that of the old K-means algorithm in accuracy.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Jung SH, Kim KJ, Lim EC, Sim CB (2017) A novel on automatic K value for efficiency improvement of K-means clustering. In: Jong Hyuk JJ, Park et al (eds). Nature Singapore Pte. Ltd. 2017. LNEE. Springer, Heidelberg, vol. 448, pp 181–186
Huang JZ, Ng MK, Rong H, Li Z (2005) Automated variable weighting in k-means type clustering. IEEE Trans Pattern Anal Mach Intell 27(5):657–668
Zhang K, Bi W, Zhang X, Fu X, Zhou K, Zhu L (2015) A new Kmeans clustering algorithm for point cloud. Int J Hybrid Inf Technol 8(9):157–170
Xiong H, Wu J, Chen J (2009) K-means clustering versus validation measures: a data-distribution perspective. IEEE Trans Syst Man Cybern B 39(2):318–331
Jung SH, Kim JC, Sim CB (2016) Prediction data processing scheme using an artificial neural network and data clustering for big data. Int J Electr Comput Eng 6(1): 330–336
Ding C, He X (2004) K-means clustering via principal component analysis. In: Proceedings of the twenty-first international conference on Machine learning. ACM
Acknowledgements
This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2017R1D1A3B03035379).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Jung, SH., So, WH., You, KS., Sim, CB. (2019). A Novel on Altered K-Means Algorithm for Clustering Cost Decrease of Non-labeling Big-Data. In: Park, J., Loia, V., Choo, KK., Yi, G. (eds) Advanced Multimedia and Ubiquitous Engineering. MUE FutureTech 2018 2018. Lecture Notes in Electrical Engineering, vol 518. Springer, Singapore. https://doi.org/10.1007/978-981-13-1328-8_48
Download citation
DOI: https://doi.org/10.1007/978-981-13-1328-8_48
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-1327-1
Online ISBN: 978-981-13-1328-8
eBook Packages: EngineeringEngineering (R0)