Abstract
The translation quality and parsing efficiency are often disappointed when Rule based Machine Translation systems deal with long sentences. Due to the complicated syntactic structure of the language, many ambiguous parse trees can be generated during the translation process, and it is not easy to select the most suitable parse tree for generating the correct translation. This paper presents an approach to parse and translate long sentences efficiently in application to Rule based Portuguese-Chinese Machine Translation. A systematic approach to break down the length of the sentences based on patterns, clauses, conjunctions, and punctuation is considered to improve the performance of the parsing analysis. On the other hand, Constraint Synchronous Grammar is used to model both source and target languages simultaneously at the parsing stage to further reduce ambiguities and the parsing efficiency.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Bennett, W.S., Slocum, J.: The LRC Machine Translation System. Computational Linguistics 11(2-3), 111–121 (1985)
Macao Special Administrative Region Government Portal, http://www.gov.mo
Jin, M.X., Kim, M.Y., Kim, D., Lee, J.H.: Segmentation of Chinese Long Sentences Using Commas. In: SIGHAN Workshop on Chinese Language Processing, pp. 1–8 (2004)
Xiong, H., Xu, W., Mi, H., Liu, Y., Liu, Q.: Sub-Sentence Division for Tree-Based Machine Translation. In: Proceedings of the 47th Annual Meeting of the ACL and the 4th IJCNLP, Short Papers, Singapore, pp. 137–140 (2009)
Li, X., Zong, C., Hu, R.: A Hierarchical Parsing Approach with Punctuation Processing for Long Chinese Sentences. In: Proceedings of the Second International Joint Conference on Natural Language Processing, Companion Volume including Posters/Demos and tutorial abstracts, Jeju Island, Republic of Korea, pp. 7–12 (2005)
Abney, S.: Parsing by Chunks. Principle-Based Parsing, pp. 257–278. Kluwer Academic Publishers, Dordrecht (1991)
Garrido-Alenda, A., Gilabert-Zarco, P., Pérez-Ortiz, J., Pertusa-Ibáñez, A., Ramírez-Sánchez, G., Sánchez-Martínez, F., Scalco, M.A., Forcada, M.L.: Shallow Parsing for Portuguese-Spanish Machine Translation. In: Branco, A., Mendes, A., Ribeiro, R. (eds.) Language technology for Portuguese: shallow processing tools and resources, pp. 135–144 (2003)
Yang, J.: Phrase Chunking for Efficient Parsing in Machine Translation System. In: Monroy, R., Arroyo-Figueroa, G., Sucar, L.E., Sossa, H. (eds.) MICAI 2004. LNCS (LNAI), vol. 2972, pp. 478–487. Springer, Heidelberg (2004)
Kim, Y.B., Ehara, T.: A Method for Partitioning of Long Japanese Sentences with Subject Resolution in J/E Machine Translation. In: Proceedings of the 1994 International Conference on Computer Processing of Oriental Languages, Taejon, Korea, pp. 467–473 (1994)
Kim, Y.S., Oh, Y.J.: Intra-sentence segmentation based on support vector machines in English-Korean machine translation systems. Expert Systems with Applications: An International Journal 34, 2673–2682 (2008)
Kim, S.D., Zhang, B.T., Kim, Y.T.: Reducing parsing complexity by intra-sentence segmentation based on maximum entropy model. In: Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora, Hong Kong, pp. 164–171 (2000)
Wong, F., Hu, D.C., Mao, Y.H., Dong, M.C., Li, Y.P.: Machine Translation Based on Constraint-Based Synchronous Grammar. In: Proceedings of the 2nd International Joint Conference on Natural Language (IJCNLP 2005), Jeju Island, Republic of Korea, pp. 612–623 (2005)
Wang, S., Lu, Y.: Gramática da Língua Portuguesa. Shanghai Foreign Language Education Press (1999)
Gee, J., Grosjean, F.: Performance structures: A psycholinguistic and linguistic appraisal. Cognitive Psychology 15, 411–458 (1983)
Costa, F.N.Q.M.C.: Deep Linguistic Processing of Portuguese Noun Phrases. Master Thesis, University of Lisbon, Portugal (2007)
Tomita, M.: An efficient augmented-context-free parsing algorithm. Computational Linguistics 13(1-2), 31–46 (1987)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Oliveira, F., Wong, F., Hong, IS. (2010). Systematic Processing of Long Sentences in Rule Based Portuguese-Chinese Machine Translation. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2010. Lecture Notes in Computer Science, vol 6008. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12116-6_35
Download citation
DOI: https://doi.org/10.1007/978-3-642-12116-6_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12115-9
Online ISBN: 978-3-642-12116-6
eBook Packages: Computer ScienceComputer Science (R0)