Abstract
One of the most critical issues in translating Korean into other languages is the common use of empty arguments. Since even mandatory elements in Korean are often dropped unlike English, the missing elements should be resolved during translation to obtain grammatical sentences. In this paper, we focus on missing subjects in intra-sentential level, which can be regarded as the identification of subject sharing between clauses. In order to reflect syntactic information in resolving missing subjects, we use a parse tree kernel, a specialized convolution kernel. In experimental evaluation, syntactic information turns out to be positively related to the identification of subject shareness. Our method achieves an accuracy of 81.39% and outperforms the baseline system assuming that two adjacent clauses share a subject.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Moschitti, A.: Making Tree Kernels Practical for Natural Language Learning. In: proceedings of the 11th International Conference on European Association for Computational Linguistics, pp. 113–120 (2006)
Egedi, D., Palmer, M., Park, H.S., Joshi, A.K.: Korean to English Translation Using Synchronous TAGs. In: Proceedings of the First Conference of the Association for Machine Translation in the Americas, pp. 48–55 (1994)
Haussler, D.: Convolution Kernels on Discrete Structures. UCS-CRL-99-10, UC Santa Cruz (1999)
Kawahara, D., Kurohashi, S.: Zero Pronoun Resolution based on Automatically Constructed Case Frames and Structural Preference of Antecedents. Journal of Natural Language Processing 11(3), 3–19 (2004)
Grosz, B.J., Joshi, A.K., Weinstein, S.: Centering: A Framework for Modeling the Local Coherence of Discourse. Computational Linguistics 21(2), 203–225 (1995)
Isozaki, H., Hirao, T.: Japanese zero pronoun resolution based on ranking rules and machine learning. In: Proceedings of Empirical Methods in Natural Language Processing, pp. 184–191 (2003)
Kim, J.-J., Choi, K.-S., Chae, Y.-S.: Phrase-Pattern-based Korean to English Machine Translation using Two Level Translation Pattern Selection. In: Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics, pp. 31–36 (2002)
Peral, J., Ferrandez, A.: Pronominal Anaphora Generation in an English-Spanish MT Approach. In: Computational Linguistics and Intelligent Text Processing, pp. 187–196 (2002)
Jiang, J.J., Conrath, D.W.: Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy. In: Proceedings of the 10th International Conference on Research in Computational Linguistics (1997)
Roh, J.-E., Lee, J.-H.: An Empirical Study for Generating Zero Pronoun in Korean based on Cost-based Centering Model. In: Proceedings of Australasian Language Technology Association, pp. 90–97 (2003)
Collins, M., Duffy, N.: Convolution Kernels for Natural Language. In: Proceedings of NIPS 2001, pp. 625–632 (2001)
Collins, M., Koehn, P., Kucerova, I.: Clause Restructing for Statistical Machine Translation. In: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, pp. 531–540 (2005)
Kim, M.-K.: A Centering Dynamics Approach to Zero Pronouns in Korean. The Discourse and Cognitive 10(3), 57–73 (2003)
Kim, M.-Y., Lee, J.-H.: Two-Phase S-Clause Segmentation. IEICE Transaction on Information and System E88-D(7), 1724–1736 (2005)
Hong, M.: Centering theory and Argument Deletion in Spoken Korean. The Korean Journal Cognitive Science (11-1), 9–24 (2000)
Chang, P.-C., Toutanova, K.: A Discriminative Syntactic Word Order Model for Machine Translation. In: Proceedings of 45th Annual Meeting of the Association for Computational Linguistics, pp. 9–16 (2007)
Zhao, S., Ng, H.T.: Identification and Resolution of Chinese Zero Pronouns: A Machine Learning Approach. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 541–550 (2007)
Joachims, T.: Making large-Scale SVM Learning Practical. In: Scholkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods - Support Vector Learning. MIT-Press, Cambridge (1999)
Roh, Y.-H., Hong, M., Choi, S.-K., Lee, K.-Y., Park, S.-K.: For the Proper Treatment of Long Sentences in a Sentence Pattern based English-Korean MT System. In: Proceedings of Machine Translation Summit IX, pp. 23–27 (2003)
Kim, Y.-J.: Subject/Object Drop in the Acquisition of Korean: A Cross-Linguistic Comparision. East Asian Linguistics 9(4), 325–351 (2000)
Lee, Y.-S., Yi, W.S., Seneff, S., Weinstein, C.J.: Interlingua-Based Broad-Coverage Korean-to-English Tranlsation in CCLINC. In: Proceedings of the first International Conference on Human language Technology Research, pp. 1–6 (2001)
Leffa, V.J.: Clause Processing in Complex Sentences. In: Proceedings of 1st International Conference on Language Resources and Evaluation, pp. 937–943 (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kim, KS., Park, SB., Song, HJ., Park, SY., Lee, SJ. (2008). Identification of Subject Shareness for Korean-English Machine Translation. In: Ho, TB., Zhou, ZH. (eds) PRICAI 2008: Trends in Artificial Intelligence. PRICAI 2008. Lecture Notes in Computer Science(), vol 5351. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89197-0_22
Download citation
DOI: https://doi.org/10.1007/978-3-540-89197-0_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-89196-3
Online ISBN: 978-3-540-89197-0
eBook Packages: Computer ScienceComputer Science (R0)