Loading [MathJax]/extensions/TeX/ieee_stixext.js
Research on mixture language model-based document clustering | IEEE Conference Publication | IEEE Xplore

Research on mixture language model-based document clustering


Abstract:

Language modeling with semantic smoothing is proposed as an effective way to improve the quality of document clustering. However, the existing semantic smoothing model is...Show More

Abstract:

Language modeling with semantic smoothing is proposed as an effective way to improve the quality of document clustering. However, the existing semantic smoothing model is not effective for partitional clustering because it can not assign fit weight to ldquogeneralrdquo word in a collection. In this paper, inspired by mixture probability model, we put forward a mixture language model for document clustering. The new model can alleviate the effect of ldquogeneralrdquo word, simultaneously, it can integrate the context information and solve the polysemy problems in a document. Based the new model, an EM algorithm for partitional clustering is present. The experimental results show our algorithms are more effective than the previous methods to improve the cluster quality.
Date of Conference: 26-28 August 2008
Date Added to IEEE Xplore: 31 October 2008
ISBN Information:
Conference Location: Hangzhou, China

References

References is not available for this document.