Abstract
This paper describes a technique for automatic creation of dictionaries using sub-symbolic representation of words in cross-language context. Semantic relationship among words of two languages is extracted from aligned bilingual text corpora. This feature is obtained applying the Latent Semantic Analysis technique to the matrices representing terms co-occurrences in aligned text fragments. The technique allows to find the “best translation” according to a properly defined geometric distance in an automatically created semantic space. Experiments show an interesting correctness of 95% obtained in the best case.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
The new american bible. Resources avalaible at, http://www.vatican.va/archive/bible/
Brown, R.D.: Automated dictionary extraction for knowledge-free examplebased translation. In: Proc. of the 7th International Conference on Theoretical and Methodological Issues in Machine Translation (1997)
Tanimoto, T., Rogers, D.: A computer program for classifying plants. Science 132 (1960)
Gaussier, E., Renders, J.-M., Matveeva, I., Goutte, C., Djean, H.: A geometric view on bilingual lexicon extraction from comparable corpora. In: ACL (2004)
Hofmann, T.: Probabilistic latent semantic analysis. In: Proceedings UAI 1999, pp. 289–296 (1999)
Koehn, P.: Europarl: A multilingual corpus for evaluation of machine translation (2003) (unpublished), http://people.csail.mit.edu/people/koehn/publications/europarl/
Littman, M., Dumais, S., Landauer, T.: Automatic cross-language information retrieval using latent semantic indexing. In: Grefenstette, G. (ed.) Cross Language Information Retrieval. Kluwer, Dordrecht (1998)
McEwan, C.J.A., Ounis, I., Ruthven, I.: Building bilingual dictionaries from parallel web documents. In: Proc. of the 24 European Colloquium on Information Retrieval Research. LNCS (2002)
Foltz, P.W., Landauer, T.K., Laham, D.: An introduction to latent semantic analysis. Discourse Processes 25, 259–284 (1998)
van Rijsbergen, C.J.: Information Retrieval (1999), http://www.dcs.gla.ac.uk/Keith/Preface.html
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Vella, F., Pilato, G., Motisi, I., Gaglio, S. (2006). Automatic Dictionary Creation by Sub-symbolic Encoding of Words. In: Apolloni, B., Marinaro, M., Nicosia, G., Tagliaferri, R. (eds) Neural Nets. WIRN NAIS 2005 2005. Lecture Notes in Computer Science, vol 3931. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11731177_17
Download citation
DOI: https://doi.org/10.1007/11731177_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33183-4
Online ISBN: 978-3-540-33184-1
eBook Packages: Computer ScienceComputer Science (R0)