skip to main content
10.1145/2838706.2838713acmotherconferencesArticle/Chapter ViewAbstractPublication PagesfireConference Proceedingsconference-collections
short-paper

word2vec or JoBimText?: A Comparison for Lexical Expansion of Hindi Words

Published: 04 December 2015 Publication History

Abstract

Exploration of distributional semantics for NLP tasks in Indian languages has been scarce. This work carries out a comparative analysis of two recent and high performing distributional semantics techniques namely word2vec and JoBimText. The task of lexical expansion of words in Hindi is considered for the analysis. A manual similarity assessment of the lexical expansions of words is employed for evaluation of the techniques. It can be observed that word2vec framework performs better than the JoBimText for various corpus sizes. Analysis of the results also presents insights on performance of the systems on various word types.

References

[1]
Decompositional Semantics for Document Embedding. http://www.cse.iitk.ac.in/users/spranjal/thesis/.
[2]
Y. Bengio, R. Ducharme, P. Vincent, and C. Janvin. A neural probabilistic language model. The Journal of Machine Learning Research, 3:1137--1155, 2003.
[3]
S. Bhingardive, R. Puduppully, D. Singh, and P. Bhattacharyya. Merging Verb Senses of Hindi WordNet using Word Embeddings. In Proceedings the 11th International Conference on Natural Language Processing (ICON), 2014.
[4]
S. Bhingardive, D. Singh, R. V, H. H. Redkar, and P. Bhattacharyya. Unsupervised most frequent sense detection using word embeddings. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1238--1243, 2015.
[5]
C. Biemann, S. Handschuh, A. Freitas, F. Meziane, and E. Métais. Natural Language Processing and Information Systems: 20th International Conference on Applications of Natural Language to Information Systems, NLDB 2015, Passau, Germany, June 17-19, 2015, Proceedings, volume 9103. Springer, 2015.
[6]
C. Biemann and M. Riedl. Text: Now in 2D! a framework for lexical expansion with contextual similarity. Journal of Language Modelling, 1(1):55--95, 2013.
[7]
A. K. Eragani, V. Kuchibhotla, D. M. Sharma, S. Reddy, and A. Kilgarriff. Hindi Word Sketches. In Proceedings the 11th International Conference on Natural Language Processing (ICON), 2014.
[8]
J. Firth. A synopsis of linguistic theory, 1930--1955 selected papers of jr firth (1952--1959), fr palmer, 168 205, 1968.
[9]
Govind, A. Ekbal, and C. Biemann. Multiobjective Optimization and Unsupervised Lexical Acquisition for Named Entity Recognition and Classification. In Proceedings the 11th International Conference on Natural Language Processing (ICON), 2014.
[10]
Z. S. Harris. Distributional structure. Word, 1954.
[11]
A. Kilgarriff, P. Rychly, P. Smrz, and D. Tugwell. The Sketch Engine. Information Technology, 105, 2004.
[12]
K. Krishnamurthi, V. R. Panuganti, and V. V. Bulusu. Influence of domain information on latent semantic analysis of hindi text. IJCSIET, 2.
[13]
K. Krishnamurthi, V. R. Panuganti, and V. V. Bulusu. Capturing the semantic structure of documents using summaries in supplemented latent semantic analysis. WSEAS Transactions on Computers, 14, 2015.
[14]
P. Majumder, M. Mitra, D. Pal, A. Bandyopadhyay, S. Maiti, S. Pal, D. Modak, and S. Sanyal. The fire 2008 evaluation exercise. ACM Transactions on Asian Language Information Processing (TALIP), 9(3):10, 2010.
[15]
R. Mihalcea, C. Corley, and C. Strapparava. Corpus-based and knowledge-based measures of text semantic similarity. In AAAI, volume 6, pages 775--780, 2006.
[16]
T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.
[17]
T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems, pages 3111--3119, 2013.
[18]
P. Singh and A. Mukerjee. Word Vector Averaging: Parserless Approach to Sentiment Analysis. In regICON-2015: Regional Symposium on Natural Language Processing, 2015.
[19]
A. SivaKumar, P. Premchand, and A. Govardhan. Indian languages ir using latent semantic indexing. International Journal of Computer Science & Information Technology (IJCSIT), 3.
[20]
A. SivaKumar, P. Premchand, and A. Govardhan. Application of latent semantic indexing for hindi-english clir irrespective of context similarity. In Trends in Network and Communications, pages 711--720. Springer, 2011.
[21]
A. Tammewar, K. Singla, B. Agrawal, R. Bhat, and D. M. Sharma. Can distributed word embeddings be an alternative to costly linguistic features: A study on parsing hindi. In Proceedings of the 6th Workshop on Statistical Parsing of Morphologically Rich Languages (SPMRL 2015), pages 21--30, 2015.
[22]
G. Tomar, M. Singh, S. Rai, A. Kumar, R. Sanyal, and S. Sanyal. Probabilistic latent semantic analysis for unsupervised word sense disambiguation. International Journal of Computer Science Issues, 10, 2013.

Cited By

View all
  • (2023)Code‐mixed Hindi‐English text correction using fuzzy graph and word embeddingExpert Systems10.1111/exsy.13328Online publication date: 14-May-2023
  • (2021)KL-NF technique for sentiment classificationMultimedia Tools and Applications10.1007/s11042-021-10559-y80:13(19885-19907)Online publication date: 1-May-2021
  • (2018)Multi-class Classification of Sentiments in Hindi Sentences Based on IntensitiesTowards Extensible and Adaptable Methods in Computing10.1007/978-981-13-2348-5_19(251-266)Online publication date: 5-Nov-2018

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
FIRE '15: Proceedings of the 7th Annual Meeting of the Forum for Information Retrieval Evaluation
December 2015
57 pages
ISBN:9781450340045
DOI:10.1145/2838706
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 December 2015

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Short-paper
  • Research
  • Refereed limited

Conference

FIRE '15
FIRE '15: Forum for Information Retrieval Evaluation
December 4 - 6, 2015
Gandhinagar, India

Acceptance Rates

FIRE '15 Paper Acceptance Rate 12 of 42 submissions, 29%;
Overall Acceptance Rate 19 of 64 submissions, 30%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Code‐mixed Hindi‐English text correction using fuzzy graph and word embeddingExpert Systems10.1111/exsy.13328Online publication date: 14-May-2023
  • (2021)KL-NF technique for sentiment classificationMultimedia Tools and Applications10.1007/s11042-021-10559-y80:13(19885-19907)Online publication date: 1-May-2021
  • (2018)Multi-class Classification of Sentiments in Hindi Sentences Based on IntensitiesTowards Extensible and Adaptable Methods in Computing10.1007/978-981-13-2348-5_19(251-266)Online publication date: 5-Nov-2018

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media