Discovering Word Senses from Text Using Random Indexing

Chatterjee, Niladri; Mohan, Shiwali

doi:10.1007/978-3-540-78135-6_25

Niladri Chatterjee¹ &
Shiwali Mohan²

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4919))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

1540 Accesses

Abstract

Random Indexing is a novel technique for dimensionality reduction while creating Word Space model from a given text. This paper explores the possible application of Random Indexing in discovering word senses from the text. The words appearing in the text are plotted onto a multi-dimensional Word Space using Random Indexing. The geometric distance between words is used as an indicative of their semantic similarity. Soft Clustering by Committee algorithm (CBC) has been used to constellate similar words. The present work shows that the Word Space model can be used effectively to determine the similarity index required for clustering. The approach does not require parsers, lexicons or any other resources which are traditionally used in sense disambiguation of words. The proposed approach has been applied to TASA corpus and encouraging results have been obtained.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Acquisition of Domain-Specific Senses and Its Extrinsic Evaluation Through Text Categorization

Enhancing Medical Word Sense Inventories Using Word Sense Induction: A Preliminary Study

Web Search Results Clustering Using Frequent Termset Mining

References

Ide, N., Veronis, J.: Word Disambiguation Ambiguation - State Of Art. Computational Linguistic (1998)
Google Scholar
Steinbach, M., Karypis, G., Kumar, V.: A comparison of document clustering techniques. Technical Report #00-034. Department of Computer Science and Engineering, University of Minnesota (2000)
Google Scholar
Cutting, D.R., et al.: Scatter/Gather: A cluster-based approach to browsing large document collections. In: Proceedings of SIGIR-1992, Copenhagen, Denmark (1992)
Google Scholar
Pantel, P., Lin, D.: Discovering word senses from text. In: Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Edmonton, Canada (2002)
Google Scholar
Karypis, G., Han, E.H., Kumar, V.: Chameleon: A hierarchical clustering algorithm using dynamic modeling. IEEE Computer: Special Issue on Data Analysis and Mining (1999)
Google Scholar
Miller, G.: WordNet: An online lexical database. International Journal of Lexicography (1990)
Google Scholar
Pantel, P.: Clustering by Committee. Ph.D. dissertation. Department of Computing Science. University of Alberta (2003)
Google Scholar
Sahlgren, M.: The Word-Space Model: Using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensionalvector spaces.Ph.D. dissertation. Department of Linguistics. Stockholm University (2006)
Google Scholar
Sahlgren, M.: An Introduction to Random Indexing. In: Proceedings of the Methods and Applications of Semantic Indexing. Workshop at the 7th International Conference on Terminology and Knowledge Engineering. TKE, Copenhagen, Denmark (2005)
Google Scholar
Landauer, T.K., Foltz, P.W., Laham, D.: An Introduction to Latent Semantic Analysis. In: 45th Annual Computer Personnel Research Conference – ACM (2004)
Google Scholar
Kanerva, P.: Sparse distributed memory. MIT Press, Cambridge (1968)
Google Scholar
Kaski, S.: Dimensionality reduction by random mapping - Fast similarity computation for clustering. In: Proceedings of the International Joint Conference on Neural Networks, IJCNN 1998. IEEE Service Center (1998)
Google Scholar
Porter, M.: An algorithm for suffix stripping. New models in probabilistic information retrieval. London (1980)
Google Scholar
http://www.dcs.gla.ac.uk/idom/ir_resources/linguistic_utils/stop_words
http://www.senseval.org/

Download references

Author information

Authors and Affiliations

Department of Mathematics, Indian Institute of Technology Delhi, New Delhi, India, 110016
Niladri Chatterjee
Yahoo! Research and Development India, Bangalore, India, 560 071
Shiwali Mohan

Authors

Niladri Chatterjee
View author publications
You can also search for this author in PubMed Google Scholar
Shiwali Mohan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chatterjee, N., Mohan, S. (2008). Discovering Word Senses from Text Using Random Indexing. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2008. Lecture Notes in Computer Science, vol 4919. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78135-6_25

Download citation

DOI: https://doi.org/10.1007/978-3-540-78135-6_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78134-9
Online ISBN: 978-3-540-78135-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Discovering Word Senses from Text Using Random Indexing

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Acquisition of Domain-Specific Senses and Its Extrinsic Evaluation Through Text Categorization

Enhancing Medical Word Sense Inventories Using Word Sense Induction: A Preliminary Study

Web Search Results Clustering Using Frequent Termset Mining

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Discovering Word Senses from Text Using Random Indexing

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Acquisition of Domain-Specific Senses and Its Extrinsic Evaluation Through Text Categorization

Enhancing Medical Word Sense Inventories Using Word Sense Induction: A Preliminary Study

Web Search Results Clustering Using Frequent Termset Mining

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation