Enhancing Clustering Performance Using Topic Modeling-Based Dimensionality Reduction

Enhancing Clustering Performance Using Topic Modeling-Based Dimensionality Reduction

T. Ramathulasi, M. Rajasekhara Babu
Copyright: © 2022 |Volume: 13 |Issue: 1 |Pages: 16
ISSN: 1942-3926|EISSN: 1942-3934|EISBN13: 9781683180975|DOI: 10.4018/IJOSSP.300755
Cite Article Cite Article

MLA

Ramathulasi, T., and M. Rajasekhara Babu. "Enhancing Clustering Performance Using Topic Modeling-Based Dimensionality Reduction." IJOSSP vol.13, no.1 2022: pp.1-16. http://doi.org/10.4018/IJOSSP.300755

APA

Ramathulasi, T. & Babu, M. R. (2022). Enhancing Clustering Performance Using Topic Modeling-Based Dimensionality Reduction. International Journal of Open Source Software and Processes (IJOSSP), 13(1), 1-16. http://doi.org/10.4018/IJOSSP.300755

Chicago

Ramathulasi, T., and M. Rajasekhara Babu. "Enhancing Clustering Performance Using Topic Modeling-Based Dimensionality Reduction," International Journal of Open Source Software and Processes (IJOSSP) 13, no.1: 1-16. http://doi.org/10.4018/IJOSSP.300755

Export Reference

Mendeley
Favorite Full-Issue Download

Abstract

Mainly in the present times, the description of the services and their working procedure have been established in natural text language. We have obtained service groups based on their similarities to reduce search space and time in service innovation. Major topic models such as LSA, LDA, and CTM policies have not been able to show effective performance due to the short description and limited description of services in text form, the reduction or absence of words that occur. To solve the issues created by brief text, the Dirichlet Multinomial Mixer model (DMM) with features representation using the Gibbs algorithm has been developed to reduce dimensionality in clustering and enhance performance. The launch results prove that DMM-Gibbs can give better results than all other methods with agglomerative or K-means clustering methods by sampling. Evaluations with internal and external criteria were used to calculate clustering performance based on these two values. Using this standard model, the dimensionality can be reduced to 93.13% and better clustering performance can be achieved.

Request Access

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global bookstore.