Abstract
Clustering algorithms such as k-Means, fail to function appropriately when used to analyze data with high dimensions. Therefore, in order to achieve a good clustering, a feature selection or a feature extraction dimensional reduction is needed. The Principal Component Analysis (PCA) algorithm often utilized the extraction methods, however, the reduction result is not too good, due to low quality of clustering and lengthy processing time. Therefore, it is necessary to study other algorithms methods to obtain alternatives to the PCA. This study therefore was conducted by comparing the results of Indonesian text document clustering, which had been reduced in dimensions by PCA, Self-Organizing Map (SOM), and Isometric Featuring Mapping (Isomap). The measurements were made on clustering quality parameters using the Davies Bouldin Index, computational time, and iterations. The results shows that SOM tend to improve cluster quality to 269.084% better than the k-Means, while, Isomap has the ability to speed up the clustering computing time by 190 times. In addition, the qualitative outcome determines the most appropriate algorithm extraction method capable of reducing clustering features of Indonesian language text documents.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Jun, S., Park, S.-S., Jang, D.-S.: Document clustering method using dimension reduction and support vector clustering to overcome sparseness. Exp. Syst. Appl. 41(7), 3204–3212 (2014)
Chen, T.C., et al. Neural network with K-means clustering via PCA for gene expression profile analysis. In: 2009 WRI World Congress on Computer Science and Information Engineering. IEEE (2009)
Jambak, M.I., et al. The impacts of singular value decomposition algorithm toward indonesian language text documents clustering. In: International Conference of Reliable Information and Communication Technology. Springer, Heidelberg (2018)
Keogh, E., Mueen, A.: Curse of dimensionality. In: Encyclopedia of Machine Learning, pp. 257–258 (2010)
Han, J., Pei, J., Kamber, M.: Data Mining: Concepts and Techniques. Elsevier, Amsterdam (2011)
Aréchiga, A., et al.: Comparison of dimensionality reduction techniques for clustering and visualization of load profiles. In: 2016 IEEE PES Transmission & Distribution Conference and Exposition-Latin America (PES T&D-LA). IEEE (2016)
Yang, X.-S., et al.: Information analysis of high-dimensional data and applications. Math. Prob. Eng. 2015, 2 (2015)
Hasanah, S.I.R., Jambak, M.I., Saputra, D.M.: Comparison of dimensional reduction using singular value decomposition and principal component analysis for clustering results of Indonesian language text documents. In: The 2nd International Conference of Applied Sciences, Mathematics, & Informatics (ICASMI) 2018. Universitas Lampung, Bandar Lampung (2018)
Adriani, M., et al.: Stemming Indonesian: a confix-stripping approach. ACM Trans. Asian Lang. Inf. Process. (TALIP) 6(4), 1–33 (2007)
Jain, D., Singh, V.: Feature selection and classification systems for chronic disease prediction: a review. Egypt. Inf. J. 19(3), 179–189 (2018)
Tan, P.-N., Steinbach, M., Kumar, V.: Cluster analysis: basic concepts and algorithms. Introduction Data Min. 8, 487–568 (2006)
Syakur, M., et al. Integration k-means clustering method and elbow method for identification of the best customer profile cluster. In: IOP Conference Series: Materials Science and Engineering. IOP Publishing (2018)
Ristevski, B., et al.: A comparison of validation indices for evaluation of clustering results of DNA microarray data. In: The 2nd International Conference on Bioinformatics and Biomedical Engineering, ICBBE 2008. IEEE (2008)
Abbas, M.I., Azis, A.I.S.: Integrasi algoritma singular value decomposition (SVD) dan principal component analysis (PCA) Untuk Pengurangan Dimensi Pada data rekam medis. In: Ilmu Komputer, UMI, pp. 99–111 (2014)
Santosa, B., Umam, A.: Data Mining dan Big Data Analytics: Teori dan Implementasi Menggunakan Python & Apache Spark. Penebar Media Pustaka, Yogyakarta (2018)
Kohonen, T.: The self-organizing map. Proc. IEEE 78(9), 1464–1480 (1990)
Haykin, S.: Multilayer perceptrons. Neural Netw. Compr. Found. 2, 156–255 (1999)
Qu, T., Cai, Z.: A fast isomap algorithm based on fibonacci heap. In: International Conference in Swarm Intelligence. Springer, Heidelberg (2015)
Weisstein, E.W., Floyd-Warshall Algorithm (2008)
Wu, Y., Chan, K.L.: An extended Isomap algorithm for learning multi-class manifold. In: Proceedings of 2004 International Conference on Machine Learning and Cybernetics (IEEE Cat. No. 04EX826). IEEE (2004)
Steyvers, M.: Multidimensional scaling. In: Encyclopedia of Cognitive Science (2006)
De Leeuw, J., Mair, P.: Multidimensional scaling using majorization: SMACOF in R (2011)
Venna, J., Kaski, S.: Local multidimensional scaling. Neural Netw. 19(6–7), 889–899 (2006)
Baozhu, W., et al.: Dimensionality reduction based on isomap and mutual information maximization. In: 2010 The 2nd Conference on Environmental Science and Information Application Technology. IEEE (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Jambak, M.I., Jambak, A.I.I., Febrianto, R.T., Saputra, D.M., Jambak, M.I. (2021). Dimension Reduction with Extraction Methods (Principal Component Analysis - Self Organizing Map - Isometric Mapping) in Indonesian Language Text Documents Clustering. In: Abraham, A., Shandilya, S., Garcia-Hernandez, L., Varela, M. (eds) Hybrid Intelligent Systems. HIS 2019. Advances in Intelligent Systems and Computing, vol 1179. Springer, Cham. https://doi.org/10.1007/978-3-030-49336-3_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-49336-3_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-49335-6
Online ISBN: 978-3-030-49336-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)