Learning-based Intrasentence Segmentation for Efficient Translation of Long Sentences

Kim, Sung-Dong; Zhang, Byoung-Tak; Kim, Yung Taek

doi:10.1023/A:1019896420277

Learning-based Intrasentence Segmentation for Efficient Translation of Long Sentences

Published: September 2001

Volume 16, pages 151–174, (2001)
Cite this article

Machine Translation

Sung-Dong Kim¹,
Byoung-Tak Zhang¹ &
Yung Taek Kim¹

197 Accesses
5 Citations
Explore all metrics

Abstract

Long-sentence analysis has been a critical problem in machine translation becauseof its high complexity. Intrasentence segmentation has been proposed as a methodfor reducing parsing complexity. This paper presents a two-step segmentation method:(1) identifying potential segmentation positions in a sentence and (2) selecting an actualsegmentation position amongst them. We have attempted to apply machine learningtechniques to the segmentation task: ``concept learning'' and ``genetic learning''. Bylearning the ``SegmentablePosition'' concept, the rules for identifying potentialsegmentation positions are postulated. The selection of the actual segmentationposition is based on a function whose parameters are determined by genetic learning.Experimental results are presented which illustrate the effectiveness of our approachto long-sentence parsing for MT. The results also show improved segmentationperformance in comparison to other existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Semi-automatic Quasi-morphological Word Segmentation for Neural Machine Translation

Automatic Long Sentence Segmentation for Neural Machine Translation

An Empirical Study on Word Segmentation for Chinese Machine Translation

References

Abney, Steven: 1991. ‘Parsing by Chunks’, in Robert Berwick, Steven Abney and Carol Tenny (eds), Principle-Based Parsing, Dordrecht, Kluwer Academic Publishers, pp. 257–278.
Google Scholar
Abney, Steven: 1995. ‘Chunks and Dependencies: Bringing Processing Evidence to Bear on Syntax’, in Jennifer Cole, Georgia M. Green and Jerry L. Morgan (eds) Computational Linguistics and the Foundations of Linguistic Theory, Stanford, CA, CSLI Publications, pp. 145–164.
Google Scholar
Abney, Steven: 1996. ‘Partial Parsing via Finite-State Cascades’, ESSLLI'96 Workshop on Robust Parsing Workshop, Prague, Czech Republic.
Beeferman, D., A. Berger and J. Lafferty: 1999. ‘Statistical Models for Text Segmentation’, Machine Learning 4, 177–210.
Google Scholar
Cestnik, B., I. Kononenko, and I. Bratko: 1987. ‘ASSISTANT-86: A Knowledge-Elicitation Tool for Sophisticated Users’, in I. Bratko and N. Lavrac (eds) Progress in Machine Learning, Wilmslow: Sigma Press.
Google Scholar
Chen, Kuang-Hua and Hsin-Hsi Chen: 1997. ‘A Hybrid Approach to Machine Translation System Design’, Computational Linguistics and Chinese Language Processing 23, 241–265.
Google Scholar
Cranias, Lambros, Harris Papageorgiou and Stelios Piperidis: 1994. ‘A Matching Technique in Example-Based Machine Translation’, COLING 94: The 15th International Conference on Computational Linguistics, Kyoto, Japan, pp. 100–104.
Dean, Thomas, James Allen and Yiannis Aloimonos: 1995. Artificial Intelligence: Theory and Practice. Amsterdam: Benjamin/Cummings Publishing Company.
Google Scholar
Gee, James Paul and François Grosjean: 1983. ‘Performance Structures: A Psycholinguistic and Linguistic Appraisal’, Cognitive Psychology 15, 411–458.
Google Scholar
Kim, Sung Dong and Yung Tack Kim: 1995. ‘Sentence Analysis Using Pattern Matching in English-Korean Machine Translation’, ICCPOL '95: International Conference on Computer Processing of Oriental Languages, Honolulu, Hawaii, pp. 199–206.
Kim Sung Dong and Kim Yung Taek: 1997. Hyo-eul-juk-in yeoung-u gu-moon boon-seok-eul wui-han moon-jang boon-hal [Sentence Segmentation for Efficient English Syntactic Analysis], Han-kook Jung-bo-gwa-hak-hoy Non-moon-ji (Journal of Korea Information Science Society) 24, 884–890.
Google Scholar
Kim, Yeun-Bae and Terumasa Ehara: 1994. ‘A Method for Partitioning of Long Japanese Sentences with Subject Resolution in J/E Machine Translation’, ICCPOL '94: International Conference on Computer Processing of Oriental Languages, Taejon, Korea, pp. 467–473.
Lee Ho Suk: 1993. Young-hasn gi-gye-byen-yeouk-eul wui-han mal-moong-chi-e gi-ban-han byeonhwan-sa-jun-wy ja-dong goo-chuk [Automatic Construction of Transfer Dictionary based on the Corpus for English-Korean Machine Translation], PhD thesis, Seoul National University.
Li, Wei-Chuan, Tzusheng Pei, Bing-Huang Lee and Chuei-Feng Chiou: 1990. ‘Parsing Long English Sentences with Pattern Rules’, COLING-90: Papers presented to the 13th International Conference on Computational Linguistics, Helsinki, Vol. 3, pp. 410–412.
Google Scholar
Lyon, Caroline and Bob Dickerson: 1995. ‘A Fast Partial Parse of Natural Language Sentences Using a Connectionist Method’ Seventh Conference of the European Chapter of the Association for Computational Linguistics, Dublin, Ireland, pp. 215–222.
Lyon, Caroline and Bob Dickerson: 1997. ‘Reducing the Complexity of Parsing by a Method of Decomposition’, International Workshop on Parsing Technology, Boston, pp. 215–222.
Mitchell, Tom M.: 1977. Version Spaces: An Approach to Concept Learning. PhD thesis, Stanford University.
Mitchell, Tom M.: 1982. ‘Generalization as Search’, Artificial Intelligence 18, 20–51.
Google Scholar
Mitchell, Tom M.: 1997. Machine Learning. New York: McGraw Hill.
Google Scholar
Nasukawa, Tetsuya: 1995. ‘Robust Parsing Based on Discourse Information: Completing partial parses of ill-formed sentences on the basis of discourse information’, 33rd Annual Meeting of the Association for Computational Linguistics, Cambridge, Mass., pp. 39–46.
Palmer, David D. and Marti A. Hearst: 1997. ‘Adaptive Multilingual Sentence Boundary Disambiguation’, Computational Linguistics 23, 241–265.
Google Scholar
Passonneau, Rebecca J. and Diane J. Litman: 1997. ‘Discourse Segmentation by Human and Automated Means’, Computational Linguistics 23, 103–139.
Google Scholar
Quinlan, J. R.: 1986. ‘Induction of Decision Trees’, Machine Learning 1, 81–106.
Google Scholar
Quinlan, J. R: 1993. C4.5: Programs for Machine Learning. San Mateo, CA: Morgan Kaufmann.
Google Scholar
Reynar, Jeffrey C. and Adwait Ratnaparkhi: 1997. ‘A Maximum Entropy Approach to Identifying Sentence Boundaries’, Fifth Conference on Applied Natural Language Processing, Washington, DC, pp. 16–19.
Schwefel, Hans-Paul: 1995. Evolution and Optimum Seeking. New York: Wiley.
Google Scholar
Tomita, Masaru: 1986. Efficient Parsing for Natural Language, Dordrecht, Kluwer Academic Publishers.
Google Scholar
Yoon, Sung Hee: 1994. ‘Efficient Parser to Find Bilingual Idiomatic Expressions for English-Korean Machine Translation’, ICCPOL '94: International Conference on Computer Processing of Oriental Languages, Taejon, Korea, pp. 455–460.
Zhang, Byoung-Tak and Yung-Taek Kim: 1990. ‘Morphological Analysis and Synthesis by Automated Discovery and Aquisition of Linguistic Rules’, in COLING-90: Papers presented to the 13th International Conference on Computational Linguistics, Helsinki, Vol. 2, pp. 431–436.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, Hansung University, Samsun-dong, Sungbuk-gu, Seoul, Korea
Sung-Dong Kim, Byoung-Tak Zhang & Yung Taek Kim

Authors

Sung-Dong Kim
View author publications
You can also search for this author in PubMed Google Scholar
Byoung-Tak Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yung Taek Kim
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kim, SD., Zhang, BT. & Kim, Y.T. Learning-based Intrasentence Segmentation for Efficient Translation of Long Sentences. Machine Translation 16, 151–174 (2001). https://doi.org/10.1023/A:1019896420277

Download citation

Issue Date: September 2001
DOI: https://doi.org/10.1023/A:1019896420277

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning-based Intrasentence Segmentation for Efficient Translation of Long Sentences

Abstract

Access this article

Similar content being viewed by others

Semi-automatic Quasi-morphological Word Segmentation for Neural Machine Translation

Automatic Long Sentence Segmentation for Neural Machine Translation

An Empirical Study on Word Segmentation for Chinese Machine Translation

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Learning-based Intrasentence Segmentation for Efficient Translation of Long Sentences

Abstract

Access this article

Similar content being viewed by others

Semi-automatic Quasi-morphological Word Segmentation for Neural Machine Translation

Automatic Long Sentence Segmentation for Neural Machine Translation

An Empirical Study on Word Segmentation for Chinese Machine Translation

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation