skip to main content
10.1145/2838706.2838713acmotherconferencesArticle/Chapter ViewAbstractPublication PagesfireConference Proceedingsconference-collections
short-paper

word2vec or JoBimText?: A Comparison for Lexical Expansion of Hindi Words

Authors Info & Claims
Published:04 December 2015Publication History

ABSTRACT

Exploration of distributional semantics for NLP tasks in Indian languages has been scarce. This work carries out a comparative analysis of two recent and high performing distributional semantics techniques namely word2vec and JoBimText. The task of lexical expansion of words in Hindi is considered for the analysis. A manual similarity assessment of the lexical expansions of words is employed for evaluation of the techniques. It can be observed that word2vec framework performs better than the JoBimText for various corpus sizes. Analysis of the results also presents insights on performance of the systems on various word types.

References

  1. Decompositional Semantics for Document Embedding. http://www.cse.iitk.ac.in/users/spranjal/thesis/.Google ScholarGoogle Scholar
  2. Y. Bengio, R. Ducharme, P. Vincent, and C. Janvin. A neural probabilistic language model. The Journal of Machine Learning Research, 3:1137--1155, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. Bhingardive, R. Puduppully, D. Singh, and P. Bhattacharyya. Merging Verb Senses of Hindi WordNet using Word Embeddings. In Proceedings the 11th International Conference on Natural Language Processing (ICON), 2014.Google ScholarGoogle Scholar
  4. S. Bhingardive, D. Singh, R. V, H. H. Redkar, and P. Bhattacharyya. Unsupervised most frequent sense detection using word embeddings. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1238--1243, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  5. C. Biemann, S. Handschuh, A. Freitas, F. Meziane, and E. Métais. Natural Language Processing and Information Systems: 20th International Conference on Applications of Natural Language to Information Systems, NLDB 2015, Passau, Germany, June 17-19, 2015, Proceedings, volume 9103. Springer, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  6. C. Biemann and M. Riedl. Text: Now in 2D! a framework for lexical expansion with contextual similarity. Journal of Language Modelling, 1(1):55--95, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  7. A. K. Eragani, V. Kuchibhotla, D. M. Sharma, S. Reddy, and A. Kilgarriff. Hindi Word Sketches. In Proceedings the 11th International Conference on Natural Language Processing (ICON), 2014.Google ScholarGoogle Scholar
  8. J. Firth. A synopsis of linguistic theory, 1930--1955 selected papers of jr firth (1952--1959), fr palmer, 168 205, 1968.Google ScholarGoogle Scholar
  9. Govind, A. Ekbal, and C. Biemann. Multiobjective Optimization and Unsupervised Lexical Acquisition for Named Entity Recognition and Classification. In Proceedings the 11th International Conference on Natural Language Processing (ICON), 2014.Google ScholarGoogle Scholar
  10. Z. S. Harris. Distributional structure. Word, 1954.Google ScholarGoogle Scholar
  11. A. Kilgarriff, P. Rychly, P. Smrz, and D. Tugwell. The Sketch Engine. Information Technology, 105, 2004.Google ScholarGoogle Scholar
  12. K. Krishnamurthi, V. R. Panuganti, and V. V. Bulusu. Influence of domain information on latent semantic analysis of hindi text. IJCSIET, 2.Google ScholarGoogle Scholar
  13. K. Krishnamurthi, V. R. Panuganti, and V. V. Bulusu. Capturing the semantic structure of documents using summaries in supplemented latent semantic analysis. WSEAS Transactions on Computers, 14, 2015.Google ScholarGoogle Scholar
  14. P. Majumder, M. Mitra, D. Pal, A. Bandyopadhyay, S. Maiti, S. Pal, D. Modak, and S. Sanyal. The fire 2008 evaluation exercise. ACM Transactions on Asian Language Information Processing (TALIP), 9(3):10, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. R. Mihalcea, C. Corley, and C. Strapparava. Corpus-based and knowledge-based measures of text semantic similarity. In AAAI, volume 6, pages 775--780, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.Google ScholarGoogle Scholar
  17. T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems, pages 3111--3119, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. P. Singh and A. Mukerjee. Word Vector Averaging: Parserless Approach to Sentiment Analysis. In regICON-2015: Regional Symposium on Natural Language Processing, 2015.Google ScholarGoogle Scholar
  19. A. SivaKumar, P. Premchand, and A. Govardhan. Indian languages ir using latent semantic indexing. International Journal of Computer Science & Information Technology (IJCSIT), 3.Google ScholarGoogle Scholar
  20. A. SivaKumar, P. Premchand, and A. Govardhan. Application of latent semantic indexing for hindi-english clir irrespective of context similarity. In Trends in Network and Communications, pages 711--720. Springer, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  21. A. Tammewar, K. Singla, B. Agrawal, R. Bhat, and D. M. Sharma. Can distributed word embeddings be an alternative to costly linguistic features: A study on parsing hindi. In Proceedings of the 6th Workshop on Statistical Parsing of Morphologically Rich Languages (SPMRL 2015), pages 21--30, 2015.Google ScholarGoogle Scholar
  22. G. Tomar, M. Singh, S. Rai, A. Kumar, R. Sanyal, and S. Sanyal. Probabilistic latent semantic analysis for unsupervised word sense disambiguation. International Journal of Computer Science Issues, 10, 2013.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    FIRE '15: Proceedings of the 7th Annual Meeting of the Forum for Information Retrieval Evaluation
    December 2015
    57 pages
    ISBN:9781450340045
    DOI:10.1145/2838706

    Copyright © 2015 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 4 December 2015

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • short-paper
    • Research
    • Refereed limited

    Acceptance Rates

    FIRE '15 Paper Acceptance Rate12of42submissions,29%Overall Acceptance Rate19of64submissions,30%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader