Abstract
In this paper, we propose an automatic tool for creating dictionary entries of Tamil words for the Universal Networking Language (UNL). Dictionary plays a crucial role in many NLP applications especially in machine translation (MT) systems. However, creating dictionary entries manually is a time consuming process. Moreover the UNL dictionary consists of additional features such as semantic constraints and attributes. To address this complex task, we propose a domain specific approach where the dictionary entries are created automatically using other word-based resources such as WordNet, bilingual dictionaries, and the UNL ontology. For the source of domain specific words, we use domain specific documents from the web. The resources used for extracting meaningful words from the documents are: Morphological analyzer, to extract the grammatical information of a given word, WordNet, to identify the semantics of the given word and UNL KB (Knowledge Base) to obtain the semantic constraints of a given word. Semantic constraints help to know the tense mood and aspect of the given word. Sometimes these semantic constraints may not be determined correctly by the automatic process. In such cases, a semantic similarity based filtering method based on UNL ontology is used to remove the incorrect dictionary entries. Thus, this automatic dictionary tool handles words semantically and also improves the correctness of the dictionary.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Ribeiro, C., Santos, R., Chaves, R.P., Marrafa, P.: “Semi-Automatic UNL dictionary generation using WordNet.PT. In: Universidade de Lisboa, CLUL CLG – Computation of Lexical and Grammatical Knowledge Research Group
Mangairkarasi, S., Gunasundari, S.: Semantic based text summarization using universal networking language. Int. J. Appl. Inf. Syst. 3(8), 18–23 (2012)
Gamallo Otero, P., Pichel Campose, J.R.: Automatic generation of bilingual dictionaries using intermediary languages and comparable corpora. In: Gelbukh, A. (ed.) CICLing 2010. LNCS, vol. 6008, pp. 473–483. Springer, Heidelberg (2010)
Verma, N., Bhattacharyya, P.: Automatic generation of multilingual lexicon by using WordNet. In: The Proceedings of Convergences 2003, International Conference on the Convergence of Knowledge, Culture, Language and Information Technologies (2003)
Verma, N., Bhattacharyya, P.: Automatic lexicon generation through WordNet. In: Global WordNet Conference (2004)
Ali, M.N.Y., Ripon, S., Allayear, S.M.: “UNL based Bangla natural text conversion – predicate preserving parser approach. Int. J. Comput. Sci. Issues 9, 259–265 (2012)
Mridha, M.F., Nur, K.M., Banik, M., Huda, M.N.: Structure of dictionary entries of Bangla morphemes for universal networking language (UNL). Int. J. Comput. Inf. Syst. Ind. Manag. Appl. 746–754 (2011)
Mridha, M.F., Nur, K.M., Banik, M., Huda, M.N.: Generation of attributes for Bangla words for universal networking language (UNL). Int. J. Adv. Comput. Sci. Appl. 2, 1–7 (2011)
Balaji, J., Geetha, T.V., Parthasarathi, R., Karky, M.: Article: morpho-semantic features for rule-based Tamil enconversion. Int. J. Comput. Appl. 26(6), 11–18 (2011)
Dhanabalan, T., Geetha, T.V.: UNL deconverter for Tamil. In: The International Conference on the Convergence of Knowledge, Culture, Language and Information Technologies (2003)
UNDL. 2011. Universal networking digital language. http://www.undl.org/. Accessed 28 September 2011
Umamaheswari, E., Ranganathan, K., Geetha T.V., Parthasarathi, R., Karky, M.: Enhancement of morphological analyzer with compound, numeral and colloquial word handler. Tamil Computing Lab (TaCoLa), College of Engineering Guindy, Anna University, Chennai
Elanchezhiyan, K., Karthikeyan, S, Geetha, T.V., Parthasarathi, R., Karky, M.: Agaraadhi: a novel online dictionary framework. In: 10th International Tamil Internet Conference of International Forum for Information Technology in Tamil
Rajendran,S.: Tamil WordNet, Department of Linguistics Tamil University, Thanjavur
UNL Ontology 2011. http://www.undl.org/unlsys/uw/UNLOntology.html
Alansary, S., Nagi, M., Adly, N.: A library information system (LIS) based on UNL knowledge infrastructure. In: Proceedings of the Universal Networking Language Workshop in conjunction with 7th International Conference on “Computer Science and Information Technology (2009)
Pushpak Bhattacharyya IndoWordNet, Lexical Resources Engineering Conference 2010 (LREC 2010), May 2010
Vossen, P.: EuroWordNet: a Multilingual Database with Lexical Semantic Networks. Spriger, Berlin (1998)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
J, G., Parthasarathi, R., V, G.T. (2015). Automatic Construction of Tamil UNL Dictionary. In: Prasath, R., Vuppala, A., Kathirvalavakumar, T. (eds) Mining Intelligence and Knowledge Exploration. MIKE 2015. Lecture Notes in Computer Science(), vol 9468. Springer, Cham. https://doi.org/10.1007/978-3-319-26832-3_58
Download citation
DOI: https://doi.org/10.1007/978-3-319-26832-3_58
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26831-6
Online ISBN: 978-3-319-26832-3
eBook Packages: Computer ScienceComputer Science (R0)