Skip to main content

Text Clustering Based on Granular Computing and Wikipedia

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6954))

Abstract

Text clustering plays an important role in many real-world applications, but it is faced with various challenges, such as, curse of dimensionality, complex semantics and large volume. A lot of researches paid attention to deal with such problems by designing new text representation models and clustering algorithms. However, text clustering still remains a research problem due to the complicated properties of text data. In this paper, a text clustering procedure is proposed based on the principle of granular computing with the aid of Wikipedia. The proposed clustering method firstly identifies the text granules, especially focusing on concepts and words with the aid of Wikipedia. And then, it mines the latent patterns based on the computation of such granules. Experimental results on benchmark data sets (20Newsgroups and Reuters-21578) have shown that the proposed method improves the performance of text clustering by comparing with the existing clustering algorithm together with the existing representation models.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Banerjee, S., Ramanathan, K., Gupta, A.: Clustering short texts using wikipedia. In: Proc. of the 30th ACM SIGIR, pp. 787–788 (2007)

    Google Scholar 

  2. Bargiela, A., Pedrycz, W.: Granular computing: an introduction. Kluwer Academic Publishers, Dordrecht (2002)

    MATH  Google Scholar 

  3. Blei, D., Ng, A., Jordan, M.: Latent dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)

    MATH  Google Scholar 

  4. Buchanan, B., Feigenbaum, E.: Knowledge-based systems in artificial intelligence. McGraw-Hill, New York (1982)

    Google Scholar 

  5. Furukawa, T.: Som of soms. Neural Networks 22, 463–478 (2009)

    Article  Google Scholar 

  6. Heeman, F.: Granularity in structured documents. Electronic Publishing 5, 143–155 (1992)

    Google Scholar 

  7. Hofmann, T.: Probabilistic latent semantic indexing. In: Proc. of ACM SIGIR, pp. 50–57 (1999)

    Google Scholar 

  8. Hu, J., Fang, L., Cao, Y., Zeng, H., Li, H., Yang, Q., Chen, Z.: Enhancing text clustering by leveraging wikipedia semantics. In: Proc. of the 31st ACM SIGIR, pp. 179–186 (2008)

    Google Scholar 

  9. Hu, X., Zhang, X., Lu, C., Park, E., Zhou, X.: Exploiting wikipedia as external knowledge for document clustering. In: Proc. of the 15th ACM SIGKDD, pp. 389–396 (2009)

    Google Scholar 

  10. Huang, A., Milne, D., Frank, E., Witten, I.: Clustering documents using a wikipedia-based concept representation. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS, vol. 5476, pp. 628–636. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  11. Jing, L., Lau, R.: Granular computing for text mining: New research challenges and opportunities. In: Sakai, H., Chakraborty, M.K., Hassanien, A.E., Ślęzak, D., Zhu, W. (eds.) RSFDGrC 2009. LNCS, vol. 5908, pp. 478–485. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  12. Jing, L., Ng, M., Huang, J.: Knowledge-based vector space model for text clustering. Knowledge and Information Systems (2009)

    Google Scholar 

  13. Kittur, A., Chi, E., Suh, B.: What’s in wikipedia? Mapping topics and conflict using socially annotated category structure. In: Proc. of the 27th CHI, pp. 1509–1512 (2009)

    Google Scholar 

  14. Medelyan, O., Witten, I., Milne, D.: Topic indexing with wikipedia. In: Proc. of AAAI (2008)

    Google Scholar 

  15. Milne, D., Witten, I.: An effective, low-cost measure of semantic relatedness obtained from wikipedia links. In: Proc. of the Workshop on Wikipedia and Artificial Intelligence at AAAI, pp. 25–30 (2008)

    Google Scholar 

  16. Steinbach, S., Karypis, G., Kumar, V.: A comparison of document clustering techniques. In: Proc. of the Workshop on Text Mining at ACM SIGKDD, pp. 1–20 (2000)

    Google Scholar 

  17. Tokunaga, K., Furukawa, T.: Modular network som. Neural Networks 22, 82–90 (2009)

    Article  Google Scholar 

  18. Wang, P., Domeniconi, C.: Building semantic kernels for text classification using wikipedia. In: Proc. of the 14th ACM SIGKDD, New York, NY, USA, pp. 713–721 (2008)

    Google Scholar 

  19. Yao, Y.: Granular computing for data mining. In: Proc. of SPIE Conf. on Data Mining, Instrusion Detection, Information Assurance and Data Networks Security, pp. 1–12 (2006)

    Google Scholar 

  20. Yates, R., Neto, B.: Modern information retrieval. Addison-Wesley Longman, Amsterdam (1999)

    Google Scholar 

  21. Yun, J., Jing, L., Yu, J., Huang, H.: Semantics-based representation model for multi-layer text classification. In: Setchi, R., Jordanov, I., Howlett, R.J., Jain, L.C. (eds.) KES 2010. LNCS, vol. 6277, pp. 1–10. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  22. Zhong, S., Ghosh, J.: A comparative study of generative models for document clustering. In: Proc. of SDW Workshop on Clustering High Dimensional Data and its Applications, San Francisco, CA (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jing, L., Yu, J. (2011). Text Clustering Based on Granular Computing and Wikipedia. In: Yao, J., Ramanna, S., Wang, G., Suraj, Z. (eds) Rough Sets and Knowledge Technology. RSKT 2011. Lecture Notes in Computer Science(), vol 6954. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24425-4_85

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-24425-4_85

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-24424-7

  • Online ISBN: 978-3-642-24425-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics