ABSTRACT
Exploration of distributional semantics for NLP tasks in Indian languages has been scarce. This work carries out a comparative analysis of two recent and high performing distributional semantics techniques namely word2vec and JoBimText. The task of lexical expansion of words in Hindi is considered for the analysis. A manual similarity assessment of the lexical expansions of words is employed for evaluation of the techniques. It can be observed that word2vec framework performs better than the JoBimText for various corpus sizes. Analysis of the results also presents insights on performance of the systems on various word types.
- Decompositional Semantics for Document Embedding. http://www.cse.iitk.ac.in/users/spranjal/thesis/.Google Scholar
- Y. Bengio, R. Ducharme, P. Vincent, and C. Janvin. A neural probabilistic language model. The Journal of Machine Learning Research, 3:1137--1155, 2003. Google ScholarDigital Library
- S. Bhingardive, R. Puduppully, D. Singh, and P. Bhattacharyya. Merging Verb Senses of Hindi WordNet using Word Embeddings. In Proceedings the 11th International Conference on Natural Language Processing (ICON), 2014.Google Scholar
- S. Bhingardive, D. Singh, R. V, H. H. Redkar, and P. Bhattacharyya. Unsupervised most frequent sense detection using word embeddings. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1238--1243, 2015.Google ScholarCross Ref
- C. Biemann, S. Handschuh, A. Freitas, F. Meziane, and E. Métais. Natural Language Processing and Information Systems: 20th International Conference on Applications of Natural Language to Information Systems, NLDB 2015, Passau, Germany, June 17-19, 2015, Proceedings, volume 9103. Springer, 2015.Google ScholarCross Ref
- C. Biemann and M. Riedl. Text: Now in 2D! a framework for lexical expansion with contextual similarity. Journal of Language Modelling, 1(1):55--95, 2013.Google ScholarCross Ref
- A. K. Eragani, V. Kuchibhotla, D. M. Sharma, S. Reddy, and A. Kilgarriff. Hindi Word Sketches. In Proceedings the 11th International Conference on Natural Language Processing (ICON), 2014.Google Scholar
- J. Firth. A synopsis of linguistic theory, 1930--1955 selected papers of jr firth (1952--1959), fr palmer, 168 205, 1968.Google Scholar
- Govind, A. Ekbal, and C. Biemann. Multiobjective Optimization and Unsupervised Lexical Acquisition for Named Entity Recognition and Classification. In Proceedings the 11th International Conference on Natural Language Processing (ICON), 2014.Google Scholar
- Z. S. Harris. Distributional structure. Word, 1954.Google Scholar
- A. Kilgarriff, P. Rychly, P. Smrz, and D. Tugwell. The Sketch Engine. Information Technology, 105, 2004.Google Scholar
- K. Krishnamurthi, V. R. Panuganti, and V. V. Bulusu. Influence of domain information on latent semantic analysis of hindi text. IJCSIET, 2.Google Scholar
- K. Krishnamurthi, V. R. Panuganti, and V. V. Bulusu. Capturing the semantic structure of documents using summaries in supplemented latent semantic analysis. WSEAS Transactions on Computers, 14, 2015.Google Scholar
- P. Majumder, M. Mitra, D. Pal, A. Bandyopadhyay, S. Maiti, S. Pal, D. Modak, and S. Sanyal. The fire 2008 evaluation exercise. ACM Transactions on Asian Language Information Processing (TALIP), 9(3):10, 2010. Google ScholarDigital Library
- R. Mihalcea, C. Corley, and C. Strapparava. Corpus-based and knowledge-based measures of text semantic similarity. In AAAI, volume 6, pages 775--780, 2006. Google ScholarDigital Library
- T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.Google Scholar
- T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems, pages 3111--3119, 2013.Google ScholarDigital Library
- P. Singh and A. Mukerjee. Word Vector Averaging: Parserless Approach to Sentiment Analysis. In regICON-2015: Regional Symposium on Natural Language Processing, 2015.Google Scholar
- A. SivaKumar, P. Premchand, and A. Govardhan. Indian languages ir using latent semantic indexing. International Journal of Computer Science & Information Technology (IJCSIT), 3.Google Scholar
- A. SivaKumar, P. Premchand, and A. Govardhan. Application of latent semantic indexing for hindi-english clir irrespective of context similarity. In Trends in Network and Communications, pages 711--720. Springer, 2011.Google ScholarCross Ref
- A. Tammewar, K. Singla, B. Agrawal, R. Bhat, and D. M. Sharma. Can distributed word embeddings be an alternative to costly linguistic features: A study on parsing hindi. In Proceedings of the 6th Workshop on Statistical Parsing of Morphologically Rich Languages (SPMRL 2015), pages 21--30, 2015.Google Scholar
- G. Tomar, M. Singh, S. Rai, A. Kumar, R. Sanyal, and S. Sanyal. Probabilistic latent semantic analysis for unsupervised word sense disambiguation. International Journal of Computer Science Issues, 10, 2013.Google Scholar
Recommendations
Word Embedding in Nepali Language using Word2Vec
NLPIR '22: Proceedings of the 2022 6th International Conference on Natural Language Processing and Information RetrievalWord embedding is a technique for understanding the relationship among words by mapping words to numbers. Several kinds of research have been carried out in this field in different languages such as English, Hindi, Bengali etc. but very few works are ...
A study of lexical function detection with word2vec and supervised machine learning
Special Section: Applied Machine Learning and Management of Volatility, Uncertainty, Complexity & Ambiguity (V.U.C.A)In this work, we report the results of our experiments on the task of distinguishing the semantics of verb-noun collocations in a Spanish corpus. This semantics was represented by four lexical functions of the Meaning-Text Theory. Each lexical function ...
Word2vec’s Distributed Word Representation for Hindi Word Sense Disambiguation
Distributed Computing and Internet Technology
Comments