Systematic Processing of Long Sentences in Rule Based Portuguese-Chinese Machine Translation

Oliveira, Francisco; Wong, Fai; Hong, Iok-Sai

doi:10.1007/978-3-642-12116-6_35

Systematic Processing of Long Sentences in Rule Based Portuguese-Chinese Machine Translation

Francisco Oliveira¹⁷,
Fai Wong¹⁷ &
Iok-Sai Hong¹⁷

Conference paper

1811 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6008))

Abstract

The translation quality and parsing efficiency are often disappointed when Rule based Machine Translation systems deal with long sentences. Due to the complicated syntactic structure of the language, many ambiguous parse trees can be generated during the translation process, and it is not easy to select the most suitable parse tree for generating the correct translation. This paper presents an approach to parse and translate long sentences efficiently in application to Rule based Portuguese-Chinese Machine Translation. A systematic approach to break down the length of the sentences based on patterns, clauses, conjunctions, and punctuation is considered to improve the performance of the parsing analysis. On the other hand, Constraint Synchronous Grammar is used to model both source and target languages simultaneously at the parsing stage to further reduce ambiguities and the parsing efficiency.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bennett, W.S., Slocum, J.: The LRC Machine Translation System. Computational Linguistics 11(2-3), 111–121 (1985)
Google Scholar
Macao Special Administrative Region Government Portal, http://www.gov.mo
Jin, M.X., Kim, M.Y., Kim, D., Lee, J.H.: Segmentation of Chinese Long Sentences Using Commas. In: SIGHAN Workshop on Chinese Language Processing, pp. 1–8 (2004)
Google Scholar
Xiong, H., Xu, W., Mi, H., Liu, Y., Liu, Q.: Sub-Sentence Division for Tree-Based Machine Translation. In: Proceedings of the 47th Annual Meeting of the ACL and the 4th IJCNLP, Short Papers, Singapore, pp. 137–140 (2009)
Google Scholar
Li, X., Zong, C., Hu, R.: A Hierarchical Parsing Approach with Punctuation Processing for Long Chinese Sentences. In: Proceedings of the Second International Joint Conference on Natural Language Processing, Companion Volume including Posters/Demos and tutorial abstracts, Jeju Island, Republic of Korea, pp. 7–12 (2005)
Google Scholar
Abney, S.: Parsing by Chunks. Principle-Based Parsing, pp. 257–278. Kluwer Academic Publishers, Dordrecht (1991)
Google Scholar
Garrido-Alenda, A., Gilabert-Zarco, P., Pérez-Ortiz, J., Pertusa-Ibáñez, A., Ramírez-Sánchez, G., Sánchez-Martínez, F., Scalco, M.A., Forcada, M.L.: Shallow Parsing for Portuguese-Spanish Machine Translation. In: Branco, A., Mendes, A., Ribeiro, R. (eds.) Language technology for Portuguese: shallow processing tools and resources, pp. 135–144 (2003)
Google Scholar
Yang, J.: Phrase Chunking for Efficient Parsing in Machine Translation System. In: Monroy, R., Arroyo-Figueroa, G., Sucar, L.E., Sossa, H. (eds.) MICAI 2004. LNCS (LNAI), vol. 2972, pp. 478–487. Springer, Heidelberg (2004)
Google Scholar
Kim, Y.B., Ehara, T.: A Method for Partitioning of Long Japanese Sentences with Subject Resolution in J/E Machine Translation. In: Proceedings of the 1994 International Conference on Computer Processing of Oriental Languages, Taejon, Korea, pp. 467–473 (1994)
Google Scholar
Kim, Y.S., Oh, Y.J.: Intra-sentence segmentation based on support vector machines in English-Korean machine translation systems. Expert Systems with Applications: An International Journal 34, 2673–2682 (2008)
Article Google Scholar
Kim, S.D., Zhang, B.T., Kim, Y.T.: Reducing parsing complexity by intra-sentence segmentation based on maximum entropy model. In: Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora, Hong Kong, pp. 164–171 (2000)
Google Scholar
Wong, F., Hu, D.C., Mao, Y.H., Dong, M.C., Li, Y.P.: Machine Translation Based on Constraint-Based Synchronous Grammar. In: Proceedings of the 2nd International Joint Conference on Natural Language (IJCNLP 2005), Jeju Island, Republic of Korea, pp. 612–623 (2005)
Google Scholar
Wang, S., Lu, Y.: Gramática da Língua Portuguesa. Shanghai Foreign Language Education Press (1999)
Google Scholar
Gee, J., Grosjean, F.: Performance structures: A psycholinguistic and linguistic appraisal. Cognitive Psychology 15, 411–458 (1983)
Article Google Scholar
Costa, F.N.Q.M.C.: Deep Linguistic Processing of Portuguese Noun Phrases. Master Thesis, University of Lisbon, Portugal (2007)
Google Scholar
Tomita, M.: An efficient augmented-context-free parsing algorithm. Computational Linguistics 13(1-2), 31–46 (1987)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Science and Technology, University of Macau, Av. Padre Tomás Pereira, Taipa, Macao
Francisco Oliveira, Fai Wong & Iok-Sai Hong

Authors

Francisco Oliveira
View author publications
You can also search for this author in PubMed Google Scholar
Fai Wong
View author publications
You can also search for this author in PubMed Google Scholar
Iok-Sai Hong
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Computing Research, National Polytechnic Institute, 07738, Mexico City, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Oliveira, F., Wong, F., Hong, IS. (2010). Systematic Processing of Long Sentences in Rule Based Portuguese-Chinese Machine Translation. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2010. Lecture Notes in Computer Science, vol 6008. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12116-6_35

Download citation

DOI: https://doi.org/10.1007/978-3-642-12116-6_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12115-9
Online ISBN: 978-3-642-12116-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics