Abstract
A non-linear semantic mapping procedure is implemented for cross-language text matching at the sentence level. The method relies on a non-linear space reduction technique which is used for constructing semantic embeddings of multilingual sentence collections. In the proposed method, an independent embedding is constructed for each language in the multilingual collection and the similarities among the resulting semantic representations are used for cross-language matching. It is shown that the proposed method outperforms other conventional cross-language information retrieval methods.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Kishida, K.: Technical issues of cross-language information retrieval: a review. Information Processing and Management 41(3), 433–455 (2005)
Oard, D.W., Diekema, A.R.: Cross-language information retrieval. Annual Review of Information Science Technology (ARIST) 33, 223–256 (1998)
Utiyama, M., Tanimura, M.: Automatic construction technology for parallel corpora. Journal of the National Institute of Information and Communications Technology 54(3), 25–31 (2007)
Potthast, M., Stein, B., Eiselt, A., Barrón, A., Rosso, P.: Overview of the 1st international competition on plagiarism detection. In: Workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse (2009), http://ceur-ws.org/Vol-502
Banchs, R., Kaltenbrunner, A.: Exploiting MDS projections for cross-language information retrieval. In: 31st Annual International ACM SIGIR Conference, pp. 863–864 (2008)
van Eck, N., Waltman, L., van den Berg, J.: A novel algorithm for visualizing concept associations. In: 16th International Workshop on Database and Expert System Applications, pp. 405–409 (2005)
Banchs, R.: Semantic mapping for related term identification. In: Gelbukh, A. (ed.) CICLing 2009. LNCS, vol. 5449, pp. 111–124. Springer, Heidelberg (2009)
Rupnik, J., Shawe-Taylor, J.: Multiview canonical correlation analysis and cross-lingual information retrieval (2008), http://videolectures.net/lms08_rupnik_rcca/
Cox, M.F., Cox, M.A.: Multidimensional Scaling. Chapman & Hall, UK (2001)
Sammon, J.W.: A nonlinear mapping for data structure analysis. IEEE Transactions on Computers 18, 401–409 (1969)
Banchs, R., Costa-jussà, M.: Extracción crosslingüe de documentos usando mapas semánticos no lineales. Procesamiento del Lenguaje Natural 43, 169–176 (2009)
Dumais, S., Landauer, T., Littman, M.: Automatic cross-linguistic information retrieval using latent semantic indexing. In: SIGIR 1996 Workshop on Cross-Lingual Information Retrieval (1996)
Chen, J., Bao, Y.: Cross-language search: the case of Google language tools. First Monday 14(3-2) (2009)
Ramírez, G., Sánchez, F., Ortiz, S., Pérez, J., Forcada, M.: Opentrad Apertium open-source machine translation system: an opportunity for business and research. In: 28th Conference on Translating and the Computer (2006)
The Apache Solr Tutorial, http://lucene.apache.org/solr/tutorial.html
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Banchs, R.E., Costa-jussà, M.R. (2010). A Non-linear Semantic Mapping Technique for Cross-Language Sentence Matching. In: Loftsson, H., Rögnvaldsson, E., Helgadóttir, S. (eds) Advances in Natural Language Processing. NLP 2010. Lecture Notes in Computer Science(), vol 6233. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14770-8_8
Download citation
DOI: https://doi.org/10.1007/978-3-642-14770-8_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14769-2
Online ISBN: 978-3-642-14770-8
eBook Packages: Computer ScienceComputer Science (R0)