Abstract
Bilingual collocation correspondence is helpful to machine translation and second language learning. Existing techniques for identifying Chinese-English collocation correspondence suffer from two major problems. They are sensitive to the coverage of the bilingual dictionary and the insensitive to semantic and contextual information. This paper presents the ICT (Improved Collocation Translation) method to overcome these problems. For a given Chinese collocation, the word translation candidates extracted from a bilingual dictionary are expanded to improve the coverage. A new translation model, which incorporates statistics extracted from monolingual corpora, word semantic similarities from monolingual thesaurus and bilingual context similarities, is employed to estimate and rank the probabilities of the collocation correspondence candidates. Experiments show that ICT is robust to the coverage of bilingual dictionary. It achieves 50.1% accuracy for the first candidate and 73.1% accuracy for the top-3 candidates.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Chang, B.B.: Translation Equivalent Pairs Extraction Based on Statistical Measures. Chinese Journal of Computers 26(1), 616–621 (2003)
Dagan, I., Itai, A.: Word Sense Disambiguation Using a Second Language Monolingual Corpus. Computational Linguistics 20(4), 563–596 (1994)
Fung, P., Yuen, Y.L.: An IR Approach for Translating New Words from Nonparallel, Comparable Texts. In: Proc. of ACL 1998, pp. 414–420 (1998)
Haruno, M., Ikehara, S., Yamazaki, T.: Learning Bilingual Collocations by Word-level Sorting. In: Proc. 16th COLING, pp. 525–530 (1996)
Koehn, P., Knight, K.: Estimating Word Translation Probabilities from Unrelated Mono-lingual Corpora using the EM Algorithm. In: Proc. of NCAI 2000, pp. 711–715 (2000)
Kupiec, J.: An Algorithm for Finding Noun Phrase Correspondences in Bilingual Corpora. In: Proc. of ACL 1993, pp. 23–30 (1993)
Li, H., Li, C.: Word Translation Disambiguation Using Bilingual Bootstrapping. Computational Linguistics 30(1) (2004)
Lin, D.K.: Principar – An Efficient, Broad-coverage, Principle-based Parser. In: Proc. of 12th COLING, pp. 482–488 (1994)
Lv, Y.J., Zhou, M.: Collocation Translation Acquisition Using Monolingual Corpora. In: Proc. of ACL 2004, pp. 167–174 (2004)
Ma, J.S., Zhang, Y., Liu, T., Li, S.: A Statistical Dependency Parser of Chinese under Small Training Data. In: Proc. of 1st IJCNLP (2004)
Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)
Mei, J.J., et al. (eds.): TongYiCiCiLin. Shanghai Dictionary Press (1996)
Patwardhan.: Incorporating Dictionary and Corpus Information into a Context Vector Measure of Semantic Relatedness, MSc. Thesis, University of Minnesota, U.S (2003)
Piao, S.L., McEnery, T.: Multi-word Unit Alignment in English-Chinese Parallel Corpora. In: Proceedings of Corpus Linguistic 2001, pp. 466–475 (2001)
Rapp, R.: Automatic Identification of Word Translations from Unrelated English and German Corpora. In: Proc. of ACL 1999, pp. 519–526 (1999)
Smadja, F., Mckeown, K.F., Hatzivassiloglou, V.: Translation Collocations for Bilingual Lexicons: A Statistical Approach. Computational Linguistics 22, 1–38 (1996)
Xu, R.F., Lu, Q.: A Multi-stage Chinese Collocation Extraction System. In: Yeung, D.S., Liu, Z.-Q., Wang, X.-Z., Yan, H. (eds.) ICMLC 2005. LNCS (LNAI), vol. 3930, pp. 740–749. Springer, Heidelberg (2006)
Yarowsky, D.: Unsupervised Word Sense Disambiguation Rivaling Supervised Methods. In: Proc. of ACL 1995, pp. 189–196 (1995)
Zhang, X.Z., Dai, W.P., Gao, P., Chen, S.B.: Everyday English Word Collocations. Dalian University of Technology Press (2003)
Zhang, Y.C., Sun, L., et al.: Bilingual Dictionary Extraction for Special Domain Based on Web Data. Journal of Chinese Information Processing 20(2), 16–23 (2006)
Zhou, M., Yuan, M., Huang, C.N.: Improving Translation Selection with a New Translation Model Trained by Independent Monolingual Corpora. Computational Linguistics and Chinese Language Processing 6(1), 1–26 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Xu, R., Wong, KF., Lu, Q., Li, W. (2006). An Improved Method for Finding Bilingual Collocation Correspondences from Monolingual Corpora. In: Matsumoto, Y., Sproat, R.W., Wong, KF., Zhang, M. (eds) Computer Processing of Oriental Languages. Beyond the Orient: The Research Challenges Ahead. ICCPOL 2006. Lecture Notes in Computer Science(), vol 4285. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11940098_6
Download citation
DOI: https://doi.org/10.1007/11940098_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49667-0
Online ISBN: 978-3-540-49668-7
eBook Packages: Computer ScienceComputer Science (R0)