Abstract
The amount of data available in semi-structured or unstructured format grows exponentially. The area of text mining aims at discovering knowledge from data of this type. Most work in this area uses the model known as bag of words to represent the texts. This form of representation, although effective, minimizes the quality of knowledge discovered because it is not able to capture essential characteristics of this type of data such as semantics and context. The paradigm of granular computing has been shown effective in the treatment of complex problems of information processing and can produce significant results in large-scale environments such as the Web. This paper explores the granulation process of words with a view to its application in the subsequent improvement in text representation. We use fuzzy relations and spectral clustering in this process and present some results.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Yao, Y.: The Art of Granular Computing. In: Kryszkiewicz, M., Peters, J.F., Rybiński, H., Skowron, A. (eds.) RSEISP 2007. LNCS (LNAI), vol. 4585, pp. 101–112. Springer, Heidelberg (2007)
Predycz, W.: Knowledge-Based Clustering: From Data to Information Granules. John Wiley & Sons, Hoboken (2005)
Yao, Y., Zhong, Y.: Granular Computing using Information Tables. In: Lin, T.Y., Yao, Y., Zadeh, L.A. (eds.) Data Mining, Rough Sets and Granular Computing, pp. 102–124. Physica, Heidelberg (2002)
Yao, Y. A Ten-year Review of Granular Computing. In: Proceedings of IEEE International Conference on Granular Computing, pp. 734–739 (2007)
Zhong, N., et al.: Towards Granular Reasoning on the Web. In: Proceedings of the 2008 Workshop on New Forms of Reasoning for Semantic Web: Scalable, Tolerant and Dynamic (NEFORD 2008), the 3rd Asian Semantic Web Conference, ASWC 2008 (2008)
Liu, G.: The Semantic Vector Space Model (SVSM): A Text Representation and Searching Techmique System Sciences. In: Proceedings of the Twenty-Seventh Hawaii International Conference on Information Systems: Collaboration Technology Organizational Systems and Technology, vol. IV, pp. 928–937 (1994)
Doan, S., Ha, S., Horiguchi, S.: A Fuzzy-Based Approach for text Representation in Text Categorization. In: 14th IEEE International Conference on Fuzzy Systems, pp. 1008–1013 (2005) ISBN: 0-7803-9159-4
Khalled, S.: A Semantic Graph Model for Text Representation and Matching in Document Mining. PhD Thesis. University of Waterloo, Canadá (2006)
Ingersen, P., Skov, B., Larsen, B.: Inter and Intra-document Context Applied in Polyrepresentation for Best Match IR. Information Processing and Management: an International Journal 44, 1673–1683 (2008)
Fishbein, J.: Integrating Structure and Meaning Using Holographic Reduced Representation to Improve Automatic Text Classification. Master Thesis, University of Waterloo (2008)
Lin, T.Y.: Granular Computing and Modeling the Human Thoughts in Web Documents. In: Melin, P., Castillo, O., Aguilar, L.T., Kacprzyk, J., Pedrycz, W. (eds.) IFSA 2007. LNCS (LNAI), vol. 4529, pp. 263–270. Springer, Heidelberg (2007) ISBN: 978-3-540-72917-4
Dumais, S., Landauer, T.: A Solution to Plato’s Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. Psychological Review 104(2), 211–240 (1997)
Steyvers, M., Griffiths, T.: Probabilistic Topic Models. In: Landauer, T., et al. (eds.) Latent Semantic Analysis: A Road to Meaning. Laurence Erlbaum (2007)
Ganter, B., Stumme, G., Wille, R. (eds.): Formal Concept Analysis. LNCS (LNAI), vol. 3626. Springer, Heidelberg (2005) ISBN 3-540-27891-5
Kozima, T.: Similarity Between Words Computed by Spreading Activation on an English Dictionary. In: Proceedings of the 6th Conference of the European Chapter of the ACL, pp. 232–239 (1993)
Rapp, R.: The Computation of Word Associations: Comparing Syntagmatic and Paradigmatic Approaches. In: Proceedings of COLING 2002 (2002)
Chakrabarti, S.: Mining the Web: Discovering Knowledge from Hypertext Data. Morgan Kaufmann (2003)
Haruechaiyasak, C., Shyu, M., Chen, M.L.: Web Classification Based on Fuzzy Association. In: Proceedings of the 25th Annual International Computer Software and Applications Conference (COMPSAC 2002) (2002)
Ng, A., Jordan, M.: On Spectral Clustering: Analysis and an Algorithm. In: Advances in Neural Information Processing Systems, vol. 14 (2001)
von Luxburg, U.: A tutorial on Spectral Clustering. Technical Report 149: Max Planck Institute for Biological Cybernetics (2006)
Shi, J., Malik, J.: Normalized Cuts and Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(8), 888–905 (2000)
Ding, C., et al.: A Min-mas Cut Algorithm for Graph Partitioning and Data Clustering. In: Proceedings of the First IEEE International Conference on Data Mining (ICDM), pp. 107–114. IEEE Computer Society, Washington, DC (2001)
Hagen, L., Kahng, A.: New Spectral Methods for Ratio Cut Partitioning and Clustering. IEEE Transactions Computer-Aided Design 11(9), 1074–1085 (1992)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Castro, P.F., Xexéo, G.B. (2012). Granules of Words to Represent Text: An Approach Based on Fuzzy Relations and Spectral Clustering. In: Murgante, B., et al. Computational Science and Its Applications – ICCSA 2012. ICCSA 2012. Lecture Notes in Computer Science, vol 7336. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31128-4_28
Download citation
DOI: https://doi.org/10.1007/978-3-642-31128-4_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31127-7
Online ISBN: 978-3-642-31128-4
eBook Packages: Computer ScienceComputer Science (R0)