Skip to main content
Log in

Shrinking digital gap through automatic generation of WordNet for Indian languages

  • Open Forum
  • Published:
AI & SOCIETY Aims and scope Submit manuscript

Abstract

Hindi ranks fourth in terms of speaker’s size in the world. In spite of that, it has <0.1 % presence on web due to lack of competent lexical resources, a key reason behind digital gap due to language barrier among Indian masses. In the footsteps of the renowned lexical resource English WordNet, 18 Indian languages initiated building WordNets under the project Indo WordNet. India is a multilingual country with around 122 languages and 234 mother tongues. Many Indian languages still do not have any reliable lexical resource, and the coverage of numerous WordNets under progress is still far from average value of 25,792. The tedious manual process and high cost are major reasons behind unsatisfactory coverage and limping progress. In this paper, we discuss the socio-cultural and economic impact of providing Internet accessibility and present an approach for the automatic generation of WordNets to tackle the lack of competent lexical resources. Problems such as accuracy, association of linguistics specific gloss/example and incorrect back-translations which arise while deviating from traditional approach of compilation by lexicographers are resolved by utilising Wikipedia available for Indian languages.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. Content languages survey on 28th April, 2014: www.w3techs.com/technologies/overview/content_language/all

  2. Ethnologue Statistics for Language: www.ethnologue.com/statistics/size

  3. Anglabharti: http://www.cse.iitk.ac.in/users/rmk/mission/mission.htm

  4. Definition of Lexical Database: www-01.sil.org/linguistics/glossaryoflinguisticterms/WhatIsALexicalDatabase.htm

  5. English Wikipedia: www.en.wikipedia.org/wiki/Wikipedia:About

  6. About Hindi WordNet on 9 March 2014: www.cfilt.iitb.ac.in/wordnet/webhwn

  7. Indo WordNet Statistics on 9 March 2014 @ 20:27:01: http://www.cfilt.iitb.ac.in/wordnet/webhwn/iwn_stats.php

  8. Digital Manifesto: www.ndtv.com/article/india/njp-launches-website-to-invite-manifesto-suggestions-434179

  9. Social media in fight against corruption: www.techinasia.com/social-media-played-a-major-role-in-india-fight-against-corruption/

  10. Social media in violence against women: www.techinasia.com/social-media-played-a-major-role-in-india-fight-against-corruption/

  11. Google Person Finder: www.google.org/personfinder/2013-uttrakhand-floods

  12. Census of India (statement 2, statement 3, statement 8): http://censusindia.gov.in/Census_Data_2001/Census_Data_Online/Language/data_on_language.html

  13. Article in The Hindu, 5 Jan 2014: http://www.thehindu.com/todays-paper/tp-national/tp-kerala/eliteracy-will-reduce-digital-divide-pm/article5540324.ece

  14. Department of Electronics and Information Technology-IT for masses: http://deity.gov.in/content/it-masses-0

  15. MNREGA scam in UP, 25 Feb 2014: www.timesofindia.indiatimes.com/india/cbi-to-question-officials-for-MNREGA-scam-in-UP/

  16. List of Wikipedia and statistics: www.meta.wikimedia.org/wiki/list_of_wikipedias

  17. CIS-Report for Indian languages Wikipedia: http://cis-india.org/a2k/blog/indian-language-wikipedia-statistics

References

  • Bhattacharyya P (2010) IndoWordNet

  • de Melo G, Weikum G (2012) Constructing and utilizing WordNets using statistical methods. Lang Resour Eval 46(2):287–311

    Article  Google Scholar 

  • Deloitte (2014) Value of connectivity economic and social benefits of expanding internet access. http://www2.deloitte.com/ch/en/pages/technology-media-and-telecommunications/articles/value-of-connectivity.html

  • Farreres X, Rigau G, Rodriguez H (1998) Using wordnet for building wordnets. In: Proceedings of COLING-ACL workshop on usage of WordNet in natural language processing systems

  • Fellbaum C (ed) (1998) WordNet: an electronic lexical database (language, speech, and communication. The MIT Press, Cambridge

    Google Scholar 

  • Gandhi MK (1958) Evil wrought by the English medium. Navajivan Publishing House, Ahemadabad

  • Hanoka V, Sagot B (2012) WordNet creation and extension made simple: a multilingual lexicon-based approach using wiki resources. In: Proceedings of the 8th international conference on Language Resources and Evaluation (LREC 2012)

  • Keisler S, Sproull L (1992) Group decision making and communication technology. Organ Behav Hum Dec Process 52(1):96–123

    Article  Google Scholar 

  • Lindén K, Niemi J (2014) Is it possible to create a very large WordNet in 100 days? An evaluation. Lang Resour Eval 48(2):191–201

  • Manyika J, Roxburgh C (2011) The great transformer: the impact of the Internet on economic growth and prosperity. McKinsey Global Institute, New York

    Google Scholar 

  • Oliveira HG, Gomes P (2014) ECO and Onto. PT: a flexible approach for creating a Portuguese wordnet automatically. Lang Resour Eval 48(2):373–393

  • Papanis E (2010) The contribution of the Internet into learning. Rev Eur Stud. http://www.ccsenet.org/journal/index.php/res/article/view/5962/4981

  • Ramanand J, Ukey A, Singh BK, Bhattacharyya P (2007) Mapping and structural analysis of multi-lingual wordnets. IEEE Data Eng Bull 30(1):30–43

    Google Scholar 

  • Sagot B, Fišer D (2012) Automatic extension of WOLF. GWC2012-6th International Global WordNet Conference

  • Tufis D, Ion R, Ide N (2004) Fine-grained word sense disambiguation based on parallel corpora, word alignment, word clustering and aligned WordNets. In: COLING’04: proceedings of the 20th international conference on computational Linguistics, Association for Computational Linguistics, Morristown, NJ, USA, p 1312

  • Vossen P (ed) (1998) EuroWordNet: a multilingual database with lexical semantic networks. Springer, Berlin

    MATH  Google Scholar 

  • Wilson KR, Wallin JS, Reiser C (2003) Social stratification and the digital divide. Soc Sci Comput Rev 21:133–143

    Article  Google Scholar 

  • Zesch T, Gurevych I, Muhlhaurser M (2007) Analyzing and accessing Wikipedia as a lexical semantic resource. In: Biannual conference of the society for Computational Linguistics and Language Technology, 213–221

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Amita Jain.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jain, A., Tayal, D.K. & Rai, S. Shrinking digital gap through automatic generation of WordNet for Indian languages. AI & Soc 30, 215–222 (2015). https://doi.org/10.1007/s00146-014-0548-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00146-014-0548-5

Keywords

Navigation