Granules of Words to Represent Text: An Approach Based on Fuzzy Relations and Spectral Clustering

Castro, Patrícia F.; Xexéo, Geraldo B.

doi:10.1007/978-3-642-31128-4_28

Granules of Words to Represent Text: An Approach Based on Fuzzy Relations and Spectral Clustering

Patrícia F. Castro²³ &
Geraldo B. Xexéo^23,24

Conference paper

2515 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7336))

Abstract

The amount of data available in semi-structured or unstructured format grows exponentially. The area of text mining aims at discovering knowledge from data of this type. Most work in this area uses the model known as bag of words to represent the texts. This form of representation, although effective, minimizes the quality of knowledge discovered because it is not able to capture essential characteristics of this type of data such as semantics and context. The paradigm of granular computing has been shown effective in the treatment of complex problems of information processing and can produce significant results in large-scale environments such as the Web. This paper explores the granulation process of words with a view to its application in the subsequent improvement in text representation. We use fuzzy relations and spectral clustering in this process and present some results.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Yao, Y.: The Art of Granular Computing. In: Kryszkiewicz, M., Peters, J.F., Rybiński, H., Skowron, A. (eds.) RSEISP 2007. LNCS (LNAI), vol. 4585, pp. 101–112. Springer, Heidelberg (2007)
Chapter Google Scholar
Predycz, W.: Knowledge-Based Clustering: From Data to Information Granules. John Wiley & Sons, Hoboken (2005)
Google Scholar
Yao, Y., Zhong, Y.: Granular Computing using Information Tables. In: Lin, T.Y., Yao, Y., Zadeh, L.A. (eds.) Data Mining, Rough Sets and Granular Computing, pp. 102–124. Physica, Heidelberg (2002)
Google Scholar
Yao, Y. A Ten-year Review of Granular Computing. In: Proceedings of IEEE International Conference on Granular Computing, pp. 734–739 (2007)
Google Scholar
Zhong, N., et al.: Towards Granular Reasoning on the Web. In: Proceedings of the 2008 Workshop on New Forms of Reasoning for Semantic Web: Scalable, Tolerant and Dynamic (NEFORD 2008), the 3rd Asian Semantic Web Conference, ASWC 2008 (2008)
Google Scholar
Liu, G.: The Semantic Vector Space Model (SVSM): A Text Representation and Searching Techmique System Sciences. In: Proceedings of the Twenty-Seventh Hawaii International Conference on Information Systems: Collaboration Technology Organizational Systems and Technology, vol. IV, pp. 928–937 (1994)
Google Scholar
Doan, S., Ha, S., Horiguchi, S.: A Fuzzy-Based Approach for text Representation in Text Categorization. In: 14th IEEE International Conference on Fuzzy Systems, pp. 1008–1013 (2005) ISBN: 0-7803-9159-4
Google Scholar
Khalled, S.: A Semantic Graph Model for Text Representation and Matching in Document Mining. PhD Thesis. University of Waterloo, Canadá (2006)
Google Scholar
Ingersen, P., Skov, B., Larsen, B.: Inter and Intra-document Context Applied in Polyrepresentation for Best Match IR. Information Processing and Management: an International Journal 44, 1673–1683 (2008)
Article Google Scholar
Fishbein, J.: Integrating Structure and Meaning Using Holographic Reduced Representation to Improve Automatic Text Classification. Master Thesis, University of Waterloo (2008)
Google Scholar
Lin, T.Y.: Granular Computing and Modeling the Human Thoughts in Web Documents. In: Melin, P., Castillo, O., Aguilar, L.T., Kacprzyk, J., Pedrycz, W. (eds.) IFSA 2007. LNCS (LNAI), vol. 4529, pp. 263–270. Springer, Heidelberg (2007) ISBN: 978-3-540-72917-4
Chapter Google Scholar
Dumais, S., Landauer, T.: A Solution to Plato’s Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. Psychological Review 104(2), 211–240 (1997)
Article Google Scholar
Steyvers, M., Griffiths, T.: Probabilistic Topic Models. In: Landauer, T., et al. (eds.) Latent Semantic Analysis: A Road to Meaning. Laurence Erlbaum (2007)
Google Scholar
Ganter, B., Stumme, G., Wille, R. (eds.): Formal Concept Analysis. LNCS (LNAI), vol. 3626. Springer, Heidelberg (2005) ISBN 3-540-27891-5
Google Scholar
Kozima, T.: Similarity Between Words Computed by Spreading Activation on an English Dictionary. In: Proceedings of the 6th Conference of the European Chapter of the ACL, pp. 232–239 (1993)
Google Scholar
Rapp, R.: The Computation of Word Associations: Comparing Syntagmatic and Paradigmatic Approaches. In: Proceedings of COLING 2002 (2002)
Google Scholar
Chakrabarti, S.: Mining the Web: Discovering Knowledge from Hypertext Data. Morgan Kaufmann (2003)
Google Scholar
Haruechaiyasak, C., Shyu, M., Chen, M.L.: Web Classification Based on Fuzzy Association. In: Proceedings of the 25th Annual International Computer Software and Applications Conference (COMPSAC 2002) (2002)
Google Scholar
Ng, A., Jordan, M.: On Spectral Clustering: Analysis and an Algorithm. In: Advances in Neural Information Processing Systems, vol. 14 (2001)
Google Scholar
von Luxburg, U.: A tutorial on Spectral Clustering. Technical Report 149: Max Planck Institute for Biological Cybernetics (2006)
Google Scholar
Shi, J., Malik, J.: Normalized Cuts and Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(8), 888–905 (2000)
Article Google Scholar
Ding, C., et al.: A Min-mas Cut Algorithm for Graph Partitioning and Data Clustering. In: Proceedings of the First IEEE International Conference on Data Mining (ICDM), pp. 107–114. IEEE Computer Society, Washington, DC (2001)
Chapter Google Scholar
Hagen, L., Kahng, A.: New Spectral Methods for Ratio Cut Partitioning and Clustering. IEEE Transactions Computer-Aided Design 11(9), 1074–1085 (1992)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Departamento de Engenharia de Sistemas e Computação, COPPE/UFRJ, Rio de Janeiro, Brasil
Patrícia F. Castro & Geraldo B. Xexéo
Departamento de Ciência da Computação, IM/UFRJ, Rio de Janeiro, Brasil
Geraldo B. Xexéo

Authors

Patrícia F. Castro
View author publications
You can also search for this author in PubMed Google Scholar
Geraldo B. Xexéo
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Laboratory of Urban and Territorial Systems, University of Basilicata, 10, Viale dell’Ateneo Lucano, 85100, Potenza, Italy
Beniamino Murgante
Department of Mathematics and Computer Science, University of Perugia, Via Vanvitelli 1, 06123, Perugia, Italy
Osvaldo Gervasi
Department of Cyber Security Science, Federal University of Technology, Gidan Kwano Campus, Minna, Nigeria
Sanjay Misra
Faculty of Engineering, Department of Electronics Engineering and Telecommunications, State University of Rio de Janeiro, Rua Sao Francisco Xavier, 524, 50. andar, sala 5145-F, Maracana, 20, 550-013, Rio de Janeiro, RJ, Brazil
Nadia Nedjah
Department of Production and Systems, University of Minho, Campus de Gualtar, 4710-057, Braga, Portugal
Ana Maria A. C. Rocha
School of Business Systems, Monash University, 3800, Clayton, VIC, Australia
David Taniar
Department of Intelligent Informatics, Kyushu Sangyo University, 2-3-1 Matsukadai, Higashi-ku, 813-8503, Fukuoka, Japan
Bernady O. Apduhan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Castro, P.F., Xexéo, G.B. (2012). Granules of Words to Represent Text: An Approach Based on Fuzzy Relations and Spectral Clustering. In: Murgante, B., et al. Computational Science and Its Applications – ICCSA 2012. ICCSA 2012. Lecture Notes in Computer Science, vol 7336. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31128-4_28

Download citation

DOI: https://doi.org/10.1007/978-3-642-31128-4_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31127-7
Online ISBN: 978-3-642-31128-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics