Skip to main content

Dimension Reduction with Extraction Methods (Principal Component Analysis - Self Organizing Map - Isometric Mapping) in Indonesian Language Text Documents Clustering

  • Conference paper
  • First Online:
Hybrid Intelligent Systems (HIS 2019)

Abstract

Clustering algorithms such as k-Means, fail to function appropriately when used to analyze data with high dimensions. Therefore, in order to achieve a good clustering, a feature selection or a feature extraction dimensional reduction is needed. The Principal Component Analysis (PCA) algorithm often utilized the extraction methods, however, the reduction result is not too good, due to low quality of clustering and lengthy processing time. Therefore, it is necessary to study other algorithms methods to obtain alternatives to the PCA. This study therefore was conducted by comparing the results of Indonesian text document clustering, which had been reduced in dimensions by PCA, Self-Organizing Map (SOM), and Isometric Featuring Mapping (Isomap). The measurements were made on clustering quality parameters using the Davies Bouldin Index, computational time, and iterations. The results shows that SOM tend to improve cluster quality to 269.084% better than the k-Means, while, Isomap has the ability to speed up the clustering computing time by 190 times. In addition, the qualitative outcome determines the most appropriate algorithm extraction method capable of reducing clustering features of Indonesian language text documents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Jun, S., Park, S.-S., Jang, D.-S.: Document clustering method using dimension reduction and support vector clustering to overcome sparseness. Exp. Syst. Appl. 41(7), 3204–3212 (2014)

    Article  Google Scholar 

  2. Chen, T.C., et al. Neural network with K-means clustering via PCA for gene expression profile analysis. In: 2009 WRI World Congress on Computer Science and Information Engineering. IEEE (2009)

    Google Scholar 

  3. Jambak, M.I., et al. The impacts of singular value decomposition algorithm toward indonesian language text documents clustering. In: International Conference of Reliable Information and Communication Technology. Springer, Heidelberg (2018)

    Google Scholar 

  4. Keogh, E., Mueen, A.: Curse of dimensionality. In: Encyclopedia of Machine Learning, pp. 257–258 (2010)

    Google Scholar 

  5. Han, J., Pei, J., Kamber, M.: Data Mining: Concepts and Techniques. Elsevier, Amsterdam (2011)

    MATH  Google Scholar 

  6. Aréchiga, A., et al.: Comparison of dimensionality reduction techniques for clustering and visualization of load profiles. In: 2016 IEEE PES Transmission & Distribution Conference and Exposition-Latin America (PES T&D-LA). IEEE (2016)

    Google Scholar 

  7. Yang, X.-S., et al.: Information analysis of high-dimensional data and applications. Math. Prob. Eng. 2015, 2 (2015)

    Google Scholar 

  8. Hasanah, S.I.R., Jambak, M.I., Saputra, D.M.: Comparison of dimensional reduction using singular value decomposition and principal component analysis for clustering results of Indonesian language text documents. In: The 2nd International Conference of Applied Sciences, Mathematics, & Informatics (ICASMI) 2018. Universitas Lampung, Bandar Lampung (2018)

    Google Scholar 

  9. Adriani, M., et al.: Stemming Indonesian: a confix-stripping approach. ACM Trans. Asian Lang. Inf. Process. (TALIP) 6(4), 1–33 (2007)

    Article  Google Scholar 

  10. Jain, D., Singh, V.: Feature selection and classification systems for chronic disease prediction: a review. Egypt. Inf. J. 19(3), 179–189 (2018)

    Google Scholar 

  11. Tan, P.-N., Steinbach, M., Kumar, V.: Cluster analysis: basic concepts and algorithms. Introduction Data Min. 8, 487–568 (2006)

    Google Scholar 

  12. Syakur, M., et al. Integration k-means clustering method and elbow method for identification of the best customer profile cluster. In: IOP Conference Series: Materials Science and Engineering. IOP Publishing (2018)

    Google Scholar 

  13. Ristevski, B., et al.: A comparison of validation indices for evaluation of clustering results of DNA microarray data. In: The 2nd International Conference on Bioinformatics and Biomedical Engineering, ICBBE 2008. IEEE (2008)

    Google Scholar 

  14. Abbas, M.I., Azis, A.I.S.: Integrasi algoritma singular value decomposition (SVD) dan principal component analysis (PCA) Untuk Pengurangan Dimensi Pada data rekam medis. In: Ilmu Komputer, UMI, pp. 99–111 (2014)

    Google Scholar 

  15. Santosa, B., Umam, A.: Data Mining dan Big Data Analytics: Teori dan Implementasi Menggunakan Python & Apache Spark. Penebar Media Pustaka, Yogyakarta (2018)

    Google Scholar 

  16. Kohonen, T.: The self-organizing map. Proc. IEEE 78(9), 1464–1480 (1990)

    Article  Google Scholar 

  17. Haykin, S.: Multilayer perceptrons. Neural Netw. Compr. Found. 2, 156–255 (1999)

    Google Scholar 

  18. Qu, T., Cai, Z.: A fast isomap algorithm based on fibonacci heap. In: International Conference in Swarm Intelligence. Springer, Heidelberg (2015)

    Google Scholar 

  19. Weisstein, E.W., Floyd-Warshall Algorithm (2008)

    Google Scholar 

  20. Wu, Y., Chan, K.L.: An extended Isomap algorithm for learning multi-class manifold. In: Proceedings of 2004 International Conference on Machine Learning and Cybernetics (IEEE Cat. No. 04EX826). IEEE (2004)

    Google Scholar 

  21. Steyvers, M.: Multidimensional scaling. In: Encyclopedia of Cognitive Science (2006)

    Google Scholar 

  22. De Leeuw, J., Mair, P.: Multidimensional scaling using majorization: SMACOF in R (2011)

    Google Scholar 

  23. Venna, J., Kaski, S.: Local multidimensional scaling. Neural Netw. 19(6–7), 889–899 (2006)

    Article  MATH  Google Scholar 

  24. Baozhu, W., et al.: Dimensionality reduction based on isomap and mutual information maximization. In: 2010 The 2nd Conference on Environmental Science and Information Application Technology. IEEE (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Muhammad Ihsan Jambak .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jambak, M.I., Jambak, A.I.I., Febrianto, R.T., Saputra, D.M., Jambak, M.I. (2021). Dimension Reduction with Extraction Methods (Principal Component Analysis - Self Organizing Map - Isometric Mapping) in Indonesian Language Text Documents Clustering. In: Abraham, A., Shandilya, S., Garcia-Hernandez, L., Varela, M. (eds) Hybrid Intelligent Systems. HIS 2019. Advances in Intelligent Systems and Computing, vol 1179. Springer, Cham. https://doi.org/10.1007/978-3-030-49336-3_1

Download citation

Publish with us

Policies and ethics