Latent semantic analysis for vector space expansion and fuzzy logic-based genetic clustering

Song, Wei; Park, Soon Cheol

doi:10.1007/s10115-009-0191-5

Latent semantic analysis for vector space expansion and fuzzy logic-based genetic clustering

Regular Paper
Published: 05 February 2009

Volume 22, pages 347–369, (2010)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Wei Song¹ &
Soon Cheol Park¹

218 Accesses
12 Citations
Explore all metrics

Abstract

This paper proposes an improved latent semantic analysis (LSA) model to represent textual document and takes advantage of a fuzzy logic based genetic algorithm (FLGA) for clustering. The standard genetic algorithm (GA) in conventional vector space model is rather difficult to deal with because the high dimensional encoding of GA makes it explore the optimal solution in a complicated space which is prone to cause an overflow problem. The LSA-based corpus model not only reduces the dimensions drastically, but also creates an underlying semantic structure which enhances its ability of distinguishing documents in terms of concepts and indirectly improves the ability of GA for clustering (genetic clustering). A novel FLGA is proposed in conjunction with this semantic model in this study. According to the nature of biological evolution, several fuzzy controllers are given to adaptively adjust and optimize the behaviors of the GA which can effectively prevent the premature convergence to a suboptimum solution. The experiment results show that the fuzzy logic controllers enhance the ability of the GA to explore the global optimum solution, and the utilization of the LSA-based text representation method to FLGA further improves its clustering performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Antanas Z, Aurelija P (2003) On multimodality of the SSTRESS criterion for metric multidimensional scaling. Informatica 14(1): 121–130
MATH Google Scholar
Bandyopadhyay S, Maulik U (2001) Nonparametric genetic clustering: comparison of validity indices. IEEE Trans Syst Man Cybern-C Appl Rev 31(1): 120–125
Article Google Scholar
Bandyopadhyay S, Pal SK (2004) Multi-objective GAs, quantitative indices and pattern classification. IEEE Trans Syst Man Cybern-B 34(5): 2088–2099
Article Google Scholar
Bellegarda J, Butzberger J, Chow Y (1996) A novel word clustering algorithm based on latent semantic analysis. In: Proceedings of the international conference on acoustics, speech and signal processing (ICASSP-96), pp 172–175
Berry MW, Dumais ST, Brien GW (1995) Using linear algebra for intelligent information retrieval. SIAM Rev 37(4): 573–595
Article MATH MathSciNet Google Scholar
Chen KK, Liu L (2008) “Best K” critical clustering structures in categorical datasets. Knowl Inform Syst (in press)
David AG, Ophir F (2004) Information retrieval: algorithms and heuristics, 2nd edn. Springer, Berlin. ISBN 1-4020-3004-5
Google Scholar
Davies D, Bouldin D (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell 1: 224–227
Article Google Scholar
Deerwester S, Dumais S, Landauer T et al (1990) Indexing by latent semantic analysis. J Am Soc Inform Sci 41(6): 391–407
Article Google Scholar
Fragoudis D, Meretakis D, Likothanassis S (2005) Best terms: an efficient feature-selection algorithm for text categorization. Knowl Inform Syst 8: 16–33
Article Google Scholar
Frigui H, Krishnapuram R (1999) A robust competitive clustering algorithm with application in computer vision. IEEE Trans Pattern Anal Mach Intell 21(1): 450–465
Article Google Scholar
Keogh E, Chakrabarti K, Pazzani M et al (2001) Dimensionality reduction for fast similarity search in large time series databases. Knowl Inform Syst 3: 263–286
Article MATH Google Scholar
Koontz W, Narendra P, Fukunaga K (1975) A branch and bound clustering algorithm. IEEE Trans Comput C-24: 908–915
Article MathSciNet Google Scholar
Koontz W, Narendra P, Fucunaga K (1975) A graph theoretic approach to nonparametric cluster analysis. IEEE Trans Comput C-25: 936–944
Article Google Scholar
Lee C, Yao X (2004) Evolutionary programming using mutations based on the Levy probability distribution. IEEE Trans Evol Comput 8(1): 1–13
Article Google Scholar
Li T (2007) Clustering based on matrix approximation: a unifying view. Knowl Inform Syst 17: 1–15
Article Google Scholar
Maulik U, Bandyopadhyay S (2000) Genetic algorithm-based clustering technique. Pattern Recognit 33(9): 1455–1465
Article Google Scholar
Michael WB, Murray B (1999) Understanding search engines: mathematical modeling and text retrieval. Society for Industrial and Applied Mathematics (SIAM), Philadelphia. ISBN 0-89871-437-0
Noorinaeini A, Lehto MR (2006) Hybrid singular value decomposition: a model of human text classification. Int J Hum Factors Model Simul 1(1): 95–118
Article Google Scholar
Porter MF (1980) An algorithm for suffixstripping. Program 14(3): 130–137
Google Scholar
Ricardo BY, Berthier RN (1999) Modern information retrieval. ACM Press, Addison-Wesley, New York ISBN 0-201-39829-X
Google Scholar
Salmeron M, Ortega J, Puntonet CG et al (2001) Improved RAN sequential prediction using orthogonal techniques. Neurocomputing 49: 153–172
Article Google Scholar
Savio LY, Lee DL (1999) Feature reduction for neural network based text categorization. In: Proceedings of the 6th IEEE international conference on database advanced systems for advanced application, pp 195–202
Selim S, Ismail M (1984) K-means-type algorithm: generalized convergence theorem and characterization of local optimality. IEEE Trans Pattern Anal Mach Intell 6: 81–87
Article MATH Google Scholar
Shepard R (1987) Towards a universal law of generalization for psychological science. Science 237(4820): 1317–1323
Article MathSciNet Google Scholar
Song W, Park SC (2006) Genetic algorithm-based text clustering technique. Lecture note in computer science, vol 4221. Springer, Berlin, pp 779–782
Srinivas M, Patnaik LM (1994) Adaptive Probabilities of Crossover and Mutation in Genetic Algorithms. IEEE Trans Syst Man Cybern 24(4): 656–667
Article Google Scholar
Sun JT, Chen Z, Zeng HJ et al (2004) Supervised latent semantic indexing for document categorization. In: Proceedings of the 6th IEEE international conference on data mining (ICDM), pp 535–538
Tarazaga P, Trosset M (1998) An approximate solution to the metric SSTRESS problem in multidimensional scaling. Comput Sci Stat 30(1): 292–295
Google Scholar
Vizine AL, Castro LN, Hruschkal ER et al (2005) Towards improving clustering ants: an adaptive ant clustering algorithm. Informatica 29: 143–154
MATH Google Scholar
Vozalis MG, Margaritis KG (2007) Using SVD and demographic data for the enhancement of generalized collaborative filtering. Inform Sci 177: 3017–3037
Article Google Scholar
Wu XD, Kumar V, Quinlan JR et al (2008) Top 10 algorithms in data mining. Knowl Inform Syst 14: 1–37
Article Google Scholar
Xia HX, Wang SG, Yoshida T (2006) A modified ant-based text clustering algorithm with semantic similarity measure. J Syst Sci Syst Eng 15(4): 474–492
Article Google Scholar
Yany Y (1995) Noise reduction in a statistical approach to text categorization. In: Proceedings of the 18th ACM international conference on research and development in information retrieval, pp 256–263
Yao X, Liu Y, Lin G (1999) Evolutionary programming made faster. IEEE Trans Evol Comput 3(2): 82–102
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electronics and Information Engineering, Chonbuk National University, Jeonju, Jeonbuk, 561756, South Korea
Wei Song & Soon Cheol Park

Authors

Wei Song
View author publications
You can also search for this author in PubMed Google Scholar
Soon Cheol Park
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wei Song.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Song, W., Park, S.C. Latent semantic analysis for vector space expansion and fuzzy logic-based genetic clustering. Knowl Inf Syst 22, 347–369 (2010). https://doi.org/10.1007/s10115-009-0191-5

Download citation

Received: 15 August 2008
Revised: 28 November 2008
Accepted: 28 December 2008
Published: 05 February 2009
Issue Date: March 2010
DOI: https://doi.org/10.1007/s10115-009-0191-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Latent semantic analysis for vector space expansion and fuzzy logic-based genetic clustering

Abstract

Access this article

Similar content being viewed by others

A Knowledge Discovery from Full-Text Document Collections Using Clustering and Interpretable Genetic-Fuzzy Systems

Semantic string operation for specializing AHC algorithm for text clustering

Research on Text Mining Based on Domain Ontology

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Latent semantic analysis for vector space expansion and fuzzy logic-based genetic clustering

Abstract

Access this article

Similar content being viewed by others

A Knowledge Discovery from Full-Text Document Collections Using Clustering and Interpretable Genetic-Fuzzy Systems

Semantic string operation for specializing AHC algorithm for text clustering

Research on Text Mining Based on Domain Ontology

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation