ABSTRACT
Abstract: Clustering analysis, specifically for extensive image data, is increasingly being applied in various fields such as finance, risk management, prediction, etc., and has been a fascinating subject in many scientific discussions. Deep learning, a widely used approach, and classical methods address complex classification problems stemming from real-world cases. In this study, we took various approaches to classification problems and measured their effectiveness by combining different techniques using the results of different scenarios. Many approaches have been proposed to solve the clustering problem; complex clustering methods such as hierarchical, density-based, centroid-based, and graph theoretical have been submitted. However, when it comes to real-world applications, they exposed significant drawbacks when the dataset introduced immeasurable vagueness, uncertainty, or overlapping samples that made it impossible to predict and classify. Several attempts have been made to improve the clustering method's performance, including joint CNN clustering models. Still, many of them carry the cons of the complicated clustering method, which limits the capability of CNN. The combined CNN clustering method is designed to address the problem with those deterministic CNN clustering models and was evaluated on a dataset we collected from the website designbyhumans.com, with enough features to represent a non-synthetic dataset. This research aims to improve upon the established model by using estimation techniques in determining model parameters and graphing plots to justify those choices and give insights into how the model performs on a non-synthetic dataset like ours. We concluded that the model significantly improved compared with a popular complex clustering method, which has been evaluated by computational time, using different metrics to represent how better separated each cluster was. Based on conducted experiments and the future development of the method, we discussed and addressed some of the drawbacks of this approach.
- P. Zdzisław, "Rough set theory and its applications", Journal of Telecommunications and Information Technology, vol. 3, pp. 7-10, 2002.Google Scholar
- Zimmermann, H.-J. (2010), Fuzzy set theory. WIREs Comp Stat, 2: 317-332. https://doi.org/10.1002/wics.82Google ScholarCross Ref
- James C. Bezdek, Robert Ehrlich, William Full, FCM: The fuzzy c-means clustering algorithm,Computers & Geosciences,Volume 10, Issues 2–3, 1984,Pages 191-203,ISSN 0098-3004,https://doi.org/10.1016/0098-3004(84)90020-7.Google ScholarCross Ref
- Ubukata, S., Notsu, A. and Honda, K., 2017. General formulation of rough C-means clustering. International Journal of Computer Science and Network Security, 17(9), pp.29-38.Google Scholar
- H. Qinghua and Y. Daren, "An Improved Clustering Algorithm for Information Granulation", 2005.Google Scholar
- Hinton, G.E., 2009. Deep belief networks. Scholarpedia, 4(5), p.5947.Google Scholar
- Salakhutdinov, R. and Larochelle, H., 2010, March. Efficient learning of deep Boltzmann machines. In Proceedings of the thirteenth international conference on artificial intelligence and statistics (pp. 693-700). JMLR Workshop and Conference Proceedings.Google Scholar
- Zhou, C. and Paffenroth, R.C., 2017, August. Anomaly detection with robust deep autoencoders. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 665-674).Google Scholar
- O'Shea, K. and Nash, R., 2015. An introduction to convolutional neural networks. arXiv preprint arXiv:1511.08458.Google Scholar
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1 (NIPS'12). Curran Associates Inc., Red Hook, NY, USA, 1097–1105.Google Scholar
- He, K., Zhang, X., Ren, S. and Sun, J., 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).Google Scholar
- Simonyan, K. and Zisserman, A., 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.Google Scholar
- Long, J., Shelhamer, E. and Darrell, T., 2015. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3431-3440).Google Scholar
- Hsu, C.C. and Lin, C.W., 2017. Cnn-based joint clustering and representation learning with feature drift compensation for large-scale image data. IEEE Transactions on Multimedia, 20(2), pp.421-429.Google Scholar
- Riaz, S., Arshad, A. and Jiao, L., 2018. Fuzzy rough C-mean based unsupervised CNN clustering for large-scale image data. Applied Sciences, 8(10), p.1869.Google Scholar
- Designbyhumans.comGoogle Scholar
Index Terms
- Improvement for Large-Scale Image Data using Fuzzy Rough C-Mean Based Unsupervised CNN Clustering: An Empirical Study on designbyhumans.com
Recommendations
Unsupervised fuzzy clustering with multi-center clusters
Clustering and modelingA new unsupervised fuzzy clustering algorithm is provided in this paper to cluster the data patterns without a priori information about the number of clusters. The initial guesses of the locations of the cluster centers or the initial guesses of the ...
A size-insensitive integrity-based fuzzy c-means method for data clustering
Fuzzy c-means (FCM) is one of the most popular techniques for data clustering. Since FCM tends to balance the number of data points in each cluster, centers of smaller clusters are forced to drift to larger adjacent clusters. For datasets with ...
A study of large-scale data clustering based on fuzzy clustering
Large-scale data are any data that cannot be loaded into the main memory of the ordinary. This is not the objective definition of large-scale data, but it is easy to understand what the large-scale data is. We first introduce some present algorithms to ...
Comments