Loading [MathJax]/extensions/MathMenu.js
Web document clustering based on a new niching Memetic Algorithm, Term-Document Matrix and Bayesian Information Criterion | IEEE Conference Publication | IEEE Xplore

Web document clustering based on a new niching Memetic Algorithm, Term-Document Matrix and Bayesian Information Criterion


Abstract:

This paper introduces a new description-centric algorithm for web document clustering based on Memetic Algorithms with Niching Methods, Term-Document Matrix and Bayesian ...Show More

Abstract:

This paper introduces a new description-centric algorithm for web document clustering based on Memetic Algorithms with Niching Methods, Term-Document Matrix and Bayesian Information Criterion. The algorithm defines the number of clusters automatically. The Memetic Algorithm provides a combined global and local strategy for a search in the solution space and the Niching methods to promote diversity in the population and prevent the population from converging too quickly (based on restricted competition replacement and restrictive mating). The Memetic Algorithm uses the K-means algorithm to find the optimum value in a local search space. Bayesian Information Criterion is used as a fitness function, while FP-Growth is used to reduce the high dimensionality in the vocabulary. This resulting algorithm, called WDC-NMA, was tested with data sets based on Reuters-21578 and DMOZ, obtaining promising results (better precision results than a Singular Value Decomposition algorithm). Also, it was also then initially evaluated by a group of users.
Date of Conference: 18-23 July 2010
Date Added to IEEE Xplore: 27 September 2010
ISBN Information:

ISSN Information:

Conference Location: Barcelona, Spain

References

References is not available for this document.