Skip to main content

Granules of Words to Represent Text: An Approach Based on Fuzzy Relations and Spectral Clustering

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7336))

Abstract

The amount of data available in semi-structured or unstructured format grows exponentially. The area of text mining aims at discovering knowledge from data of this type. Most work in this area uses the model known as bag of words to represent the texts. This form of representation, although effective, minimizes the quality of knowledge discovered because it is not able to capture essential characteristics of this type of data such as semantics and context. The paradigm of granular computing has been shown effective in the treatment of complex problems of information processing and can produce significant results in large-scale environments such as the Web. This paper explores the granulation process of words with a view to its application in the subsequent improvement in text representation. We use fuzzy relations and spectral clustering in this process and present some results.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Yao, Y.: The Art of Granular Computing. In: Kryszkiewicz, M., Peters, J.F., Rybiński, H., Skowron, A. (eds.) RSEISP 2007. LNCS (LNAI), vol. 4585, pp. 101–112. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  2. Predycz, W.: Knowledge-Based Clustering: From Data to Information Granules. John Wiley & Sons, Hoboken (2005)

    Google Scholar 

  3. Yao, Y., Zhong, Y.: Granular Computing using Information Tables. In: Lin, T.Y., Yao, Y., Zadeh, L.A. (eds.) Data Mining, Rough Sets and Granular Computing, pp. 102–124. Physica, Heidelberg (2002)

    Google Scholar 

  4. Yao, Y. A Ten-year Review of Granular Computing. In: Proceedings of IEEE International Conference on Granular Computing, pp. 734–739 (2007)

    Google Scholar 

  5. Zhong, N., et al.: Towards Granular Reasoning on the Web. In: Proceedings of the 2008 Workshop on New Forms of Reasoning for Semantic Web: Scalable, Tolerant and Dynamic (NEFORD 2008), the 3rd Asian Semantic Web Conference, ASWC 2008 (2008)

    Google Scholar 

  6. Liu, G.: The Semantic Vector Space Model (SVSM): A Text Representation and Searching Techmique System Sciences. In: Proceedings of the Twenty-Seventh Hawaii International Conference on Information Systems: Collaboration Technology Organizational Systems and Technology, vol. IV, pp. 928–937 (1994)

    Google Scholar 

  7. Doan, S., Ha, S., Horiguchi, S.: A Fuzzy-Based Approach for text Representation in Text Categorization. In: 14th IEEE International Conference on Fuzzy Systems, pp. 1008–1013 (2005) ISBN: 0-7803-9159-4

    Google Scholar 

  8. Khalled, S.: A Semantic Graph Model for Text Representation and Matching in Document Mining. PhD Thesis. University of Waterloo, Canadá (2006)

    Google Scholar 

  9. Ingersen, P., Skov, B., Larsen, B.: Inter and Intra-document Context Applied in Polyrepresentation for Best Match IR. Information Processing and Management: an International Journal 44, 1673–1683 (2008)

    Article  Google Scholar 

  10. Fishbein, J.: Integrating Structure and Meaning Using Holographic Reduced Representation to Improve Automatic Text Classification. Master Thesis, University of Waterloo (2008)

    Google Scholar 

  11. Lin, T.Y.: Granular Computing and Modeling the Human Thoughts in Web Documents. In: Melin, P., Castillo, O., Aguilar, L.T., Kacprzyk, J., Pedrycz, W. (eds.) IFSA 2007. LNCS (LNAI), vol. 4529, pp. 263–270. Springer, Heidelberg (2007) ISBN: 978-3-540-72917-4

    Chapter  Google Scholar 

  12. Dumais, S., Landauer, T.: A Solution to Plato’s Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. Psychological Review 104(2), 211–240 (1997)

    Article  Google Scholar 

  13. Steyvers, M., Griffiths, T.: Probabilistic Topic Models. In: Landauer, T., et al. (eds.) Latent Semantic Analysis: A Road to Meaning. Laurence Erlbaum (2007)

    Google Scholar 

  14. Ganter, B., Stumme, G., Wille, R. (eds.): Formal Concept Analysis. LNCS (LNAI), vol. 3626. Springer, Heidelberg (2005) ISBN 3-540-27891-5

    Google Scholar 

  15. Kozima, T.: Similarity Between Words Computed by Spreading Activation on an English Dictionary. In: Proceedings of the 6th Conference of the European Chapter of the ACL, pp. 232–239 (1993)

    Google Scholar 

  16. Rapp, R.: The Computation of Word Associations: Comparing Syntagmatic and Paradigmatic Approaches. In: Proceedings of COLING 2002 (2002)

    Google Scholar 

  17. Chakrabarti, S.: Mining the Web: Discovering Knowledge from Hypertext Data. Morgan Kaufmann (2003)

    Google Scholar 

  18. Haruechaiyasak, C., Shyu, M., Chen, M.L.: Web Classification Based on Fuzzy Association. In: Proceedings of the 25th Annual International Computer Software and Applications Conference (COMPSAC 2002) (2002)

    Google Scholar 

  19. Ng, A., Jordan, M.: On Spectral Clustering: Analysis and an Algorithm. In: Advances in Neural Information Processing Systems, vol. 14 (2001)

    Google Scholar 

  20. von Luxburg, U.: A tutorial on Spectral Clustering. Technical Report 149: Max Planck Institute for Biological Cybernetics (2006)

    Google Scholar 

  21. Shi, J., Malik, J.: Normalized Cuts and Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(8), 888–905 (2000)

    Article  Google Scholar 

  22. Ding, C., et al.: A Min-mas Cut Algorithm for Graph Partitioning and Data Clustering. In: Proceedings of the First IEEE International Conference on Data Mining (ICDM), pp. 107–114. IEEE Computer Society, Washington, DC (2001)

    Chapter  Google Scholar 

  23. Hagen, L., Kahng, A.: New Spectral Methods for Ratio Cut Partitioning and Clustering. IEEE Transactions Computer-Aided Design 11(9), 1074–1085 (1992)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Castro, P.F., Xexéo, G.B. (2012). Granules of Words to Represent Text: An Approach Based on Fuzzy Relations and Spectral Clustering. In: Murgante, B., et al. Computational Science and Its Applications – ICCSA 2012. ICCSA 2012. Lecture Notes in Computer Science, vol 7336. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31128-4_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-31128-4_28

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-31127-7

  • Online ISBN: 978-3-642-31128-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics