Text Clustering Based on Granular Computing and Wikipedia

Jing, Liping; Yu, Jian

doi:10.1007/978-3-642-24425-4_85

Text Clustering Based on Granular Computing and Wikipedia

Liping Jing²³ &
Jian Yu²³

Conference paper

1886 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6954))

Abstract

Text clustering plays an important role in many real-world applications, but it is faced with various challenges, such as, curse of dimensionality, complex semantics and large volume. A lot of researches paid attention to deal with such problems by designing new text representation models and clustering algorithms. However, text clustering still remains a research problem due to the complicated properties of text data. In this paper, a text clustering procedure is proposed based on the principle of granular computing with the aid of Wikipedia. The proposed clustering method firstly identifies the text granules, especially focusing on concepts and words with the aid of Wikipedia. And then, it mines the latent patterns based on the computation of such granules. Experimental results on benchmark data sets (20Newsgroups and Reuters-21578) have shown that the proposed method improves the performance of text clustering by comparing with the existing clustering algorithm together with the existing representation models.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Banerjee, S., Ramanathan, K., Gupta, A.: Clustering short texts using wikipedia. In: Proc. of the 30th ACM SIGIR, pp. 787–788 (2007)
Google Scholar
Bargiela, A., Pedrycz, W.: Granular computing: an introduction. Kluwer Academic Publishers, Dordrecht (2002)
MATH Google Scholar
Blei, D., Ng, A., Jordan, M.: Latent dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)
MATH Google Scholar
Buchanan, B., Feigenbaum, E.: Knowledge-based systems in artificial intelligence. McGraw-Hill, New York (1982)
Google Scholar
Furukawa, T.: Som of soms. Neural Networks 22, 463–478 (2009)
Article Google Scholar
Heeman, F.: Granularity in structured documents. Electronic Publishing 5, 143–155 (1992)
Google Scholar
Hofmann, T.: Probabilistic latent semantic indexing. In: Proc. of ACM SIGIR, pp. 50–57 (1999)
Google Scholar
Hu, J., Fang, L., Cao, Y., Zeng, H., Li, H., Yang, Q., Chen, Z.: Enhancing text clustering by leveraging wikipedia semantics. In: Proc. of the 31st ACM SIGIR, pp. 179–186 (2008)
Google Scholar
Hu, X., Zhang, X., Lu, C., Park, E., Zhou, X.: Exploiting wikipedia as external knowledge for document clustering. In: Proc. of the 15th ACM SIGKDD, pp. 389–396 (2009)
Google Scholar
Huang, A., Milne, D., Frank, E., Witten, I.: Clustering documents using a wikipedia-based concept representation. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS, vol. 5476, pp. 628–636. Springer, Heidelberg (2009)
Chapter Google Scholar
Jing, L., Lau, R.: Granular computing for text mining: New research challenges and opportunities. In: Sakai, H., Chakraborty, M.K., Hassanien, A.E., Ślęzak, D., Zhu, W. (eds.) RSFDGrC 2009. LNCS, vol. 5908, pp. 478–485. Springer, Heidelberg (2009)
Chapter Google Scholar
Jing, L., Ng, M., Huang, J.: Knowledge-based vector space model for text clustering. Knowledge and Information Systems (2009)
Google Scholar
Kittur, A., Chi, E., Suh, B.: What’s in wikipedia? Mapping topics and conflict using socially annotated category structure. In: Proc. of the 27th CHI, pp. 1509–1512 (2009)
Google Scholar
Medelyan, O., Witten, I., Milne, D.: Topic indexing with wikipedia. In: Proc. of AAAI (2008)
Google Scholar
Milne, D., Witten, I.: An effective, low-cost measure of semantic relatedness obtained from wikipedia links. In: Proc. of the Workshop on Wikipedia and Artificial Intelligence at AAAI, pp. 25–30 (2008)
Google Scholar
Steinbach, S., Karypis, G., Kumar, V.: A comparison of document clustering techniques. In: Proc. of the Workshop on Text Mining at ACM SIGKDD, pp. 1–20 (2000)
Google Scholar
Tokunaga, K., Furukawa, T.: Modular network som. Neural Networks 22, 82–90 (2009)
Article Google Scholar
Wang, P., Domeniconi, C.: Building semantic kernels for text classification using wikipedia. In: Proc. of the 14th ACM SIGKDD, New York, NY, USA, pp. 713–721 (2008)
Google Scholar
Yao, Y.: Granular computing for data mining. In: Proc. of SPIE Conf. on Data Mining, Instrusion Detection, Information Assurance and Data Networks Security, pp. 1–12 (2006)
Google Scholar
Yates, R., Neto, B.: Modern information retrieval. Addison-Wesley Longman, Amsterdam (1999)
Google Scholar
Yun, J., Jing, L., Yu, J., Huang, H.: Semantics-based representation model for multi-layer text classification. In: Setchi, R., Jordanov, I., Howlett, R.J., Jain, L.C. (eds.) KES 2010. LNCS, vol. 6277, pp. 1–10. Springer, Heidelberg (2010)
Chapter Google Scholar
Zhong, S., Ghosh, J.: A comparative study of generative models for document clustering. In: Proc. of SDW Workshop on Clustering High Dimensional Data and its Applications, San Francisco, CA (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer and Information Technology, Beijing Jiaotong University, Beijing, China
Liping Jing & Jian Yu

Authors

Liping Jing
View author publications
You can also search for this author in PubMed Google Scholar
Jian Yu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Regina, Regina, S4S 0A2, Saskatchewan, Canada
JingTao Yao
Department of Applied Computer Science, University of Winnipeg, R3B 2E9, Winnipeg, Canada
Sheela Ramanna
Institute of Computer Science & Technology, Chongqing University of Posts and Telecommunications, 400065, Chongqing, P.R. China
Guoyin Wang
Chair of Computer Science, University of Rzeszów, Poland
Zbigniew Suraj

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jing, L., Yu, J. (2011). Text Clustering Based on Granular Computing and Wikipedia. In: Yao, J., Ramanna, S., Wang, G., Suraj, Z. (eds) Rough Sets and Knowledge Technology. RSKT 2011. Lecture Notes in Computer Science(), vol 6954. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24425-4_85

Download citation

DOI: https://doi.org/10.1007/978-3-642-24425-4_85
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24424-7
Online ISBN: 978-3-642-24425-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics