Abstract
There are large numbers of well-translated sentence pairs on the Web, which can be used for translating sentences in different languages. It is an interesting problem to search the closest sentence translations from the Web for high-quality translation, which has attracted significant attention recently. However, it is not straightforward to develop an effective approach, as this task heavily depends on the effectiveness of the similarity model which is used to quantify the similarity between two sentences. In this paper, we propose several optimization techniques to address this problem. We devise a phrase-based model to quantify the similarity between two sentences. We judiciously select high-quality phrases from sentences, which can capture the key features of sentences and thus can be used to quantify similarity between sentences. Experimental results show that our approach has performance advantages compared with the state-of-the-art sentence matching methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Biçici, E., Dymetman, M.: Dynamic translation memory: Using statistical machine translation to improve translation memory fuzzy matches. In: CICLing, pp. 454–465 (2008)
Cai, K., Bu, J., Chen, C., Liu, K.: Exploration of term dependence in sentence retrieval. In: ACL (2007)
Cancedda, N., Gaussier, E., Goutte, C., Renders, J.: Word sequence kernels. The Journal of Machine Learning Research 3, 1059–1082 (2003)
Damerau, F.: Markov models and linguistic theory: an experimental study of a model for English. Mouton De Gruyter, Berlin (1971)
Garcia, I.: Power shifts in web-based translation memory. Machine Translation 21(1), 55–68 (2007)
Ichikawa, H., Hakoda, K., Hashimoto, T., Tokunaga, T.: Efficient sentence retrieval based on syntactic structure. In: ACL (2006)
Li, C., Wang, B., Yang, X.: Vgram: Improving performance of approximate queries on string collections using variable-length grams. In: VLDB, pp. 303–314 (2007)
Lin, D.: Dependency-Based Evaluation Of Minipar. Treebanks: Building and Using Parsed Corpora (2003)
Mel’cuk, I.: Dependency Syntax: Theory and Practice. State University of New York Press (1988)
Metzler, D., Bernstein, Y., Croft, W.B., Moffat, A., Zobel, J.: Similarity measures for tracking information flow. In: CIKM, pp. 517–524 (2005)
Metzler, D., Croft, W.B.: A markov random field model for term dependencies. In: SIGIR, pp. 472–479 (2005)
Murdock, V., Croft, W.B.: A translation model for sentence retrieval. In: HLT/EMNLP (2005)
Planas, E., Furuse, O.: Multi-level similar segment matching algorithm for translation memories and example-based machine translation. In: COLING, pp. 621–627 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fan, J., Li, G., Zhou, L. (2011). An Effective Approach for Searching Closest Sentence Translations from the Web. In: Yu, J.X., Kim, M.H., Unland, R. (eds) Database Systems for Advanced Applications. DASFAA 2011. Lecture Notes in Computer Science, vol 6588. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20152-3_4
Download citation
DOI: https://doi.org/10.1007/978-3-642-20152-3_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20151-6
Online ISBN: 978-3-642-20152-3
eBook Packages: Computer ScienceComputer Science (R0)