Skip to main content
Log in

Learning-based Intrasentence Segmentation for Efficient Translation of Long Sentences

  • Published:
Machine Translation

Abstract

Long-sentence analysis has been a critical problem in machine translation becauseof its high complexity. Intrasentence segmentation has been proposed as a methodfor reducing parsing complexity. This paper presents a two-step segmentation method:(1) identifying potential segmentation positions in a sentence and (2) selecting an actualsegmentation position amongst them. We have attempted to apply machine learningtechniques to the segmentation task: ``concept learning'' and ``genetic learning''. Bylearning the ``SegmentablePosition'' concept, the rules for identifying potentialsegmentation positions are postulated. The selection of the actual segmentationposition is based on a function whose parameters are determined by genetic learning.Experimental results are presented which illustrate the effectiveness of our approachto long-sentence parsing for MT. The results also show improved segmentationperformance in comparison to other existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Abney, Steven: 1991. ‘Parsing by Chunks’, in Robert Berwick, Steven Abney and Carol Tenny (eds), Principle-Based Parsing, Dordrecht, Kluwer Academic Publishers, pp. 257–278.

    Google Scholar 

  • Abney, Steven: 1995. ‘Chunks and Dependencies: Bringing Processing Evidence to Bear on Syntax’, in Jennifer Cole, Georgia M. Green and Jerry L. Morgan (eds) Computational Linguistics and the Foundations of Linguistic Theory, Stanford, CA, CSLI Publications, pp. 145–164.

    Google Scholar 

  • Abney, Steven: 1996. ‘Partial Parsing via Finite-State Cascades’, ESSLLI'96 Workshop on Robust Parsing Workshop, Prague, Czech Republic.

  • Beeferman, D., A. Berger and J. Lafferty: 1999. ‘Statistical Models for Text Segmentation’, Machine Learning 4, 177–210.

    Google Scholar 

  • Cestnik, B., I. Kononenko, and I. Bratko: 1987. ‘ASSISTANT-86: A Knowledge-Elicitation Tool for Sophisticated Users’, in I. Bratko and N. Lavrac (eds) Progress in Machine Learning, Wilmslow: Sigma Press.

    Google Scholar 

  • Chen, Kuang-Hua and Hsin-Hsi Chen: 1997. ‘A Hybrid Approach to Machine Translation System Design’, Computational Linguistics and Chinese Language Processing 23, 241–265.

    Google Scholar 

  • Cranias, Lambros, Harris Papageorgiou and Stelios Piperidis: 1994. ‘A Matching Technique in Example-Based Machine Translation’, COLING 94: The 15th International Conference on Computational Linguistics, Kyoto, Japan, pp. 100–104.

  • Dean, Thomas, James Allen and Yiannis Aloimonos: 1995. Artificial Intelligence: Theory and Practice. Amsterdam: Benjamin/Cummings Publishing Company.

    Google Scholar 

  • Gee, James Paul and François Grosjean: 1983. ‘Performance Structures: A Psycholinguistic and Linguistic Appraisal’, Cognitive Psychology 15, 411–458.

    Google Scholar 

  • Kim, Sung Dong and Yung Tack Kim: 1995. ‘Sentence Analysis Using Pattern Matching in English-Korean Machine Translation’, ICCPOL '95: International Conference on Computer Processing of Oriental Languages, Honolulu, Hawaii, pp. 199–206.

  • Kim Sung Dong and Kim Yung Taek: 1997. Hyo-eul-juk-in yeoung-u gu-moon boon-seok-eul wui-han moon-jang boon-hal [Sentence Segmentation for Efficient English Syntactic Analysis], Han-kook Jung-bo-gwa-hak-hoy Non-moon-ji (Journal of Korea Information Science Society) 24, 884–890.

    Google Scholar 

  • Kim, Yeun-Bae and Terumasa Ehara: 1994. ‘A Method for Partitioning of Long Japanese Sentences with Subject Resolution in J/E Machine Translation’, ICCPOL '94: International Conference on Computer Processing of Oriental Languages, Taejon, Korea, pp. 467–473.

  • Lee Ho Suk: 1993. Young-hasn gi-gye-byen-yeouk-eul wui-han mal-moong-chi-e gi-ban-han byeonhwan-sa-jun-wy ja-dong goo-chuk [Automatic Construction of Transfer Dictionary based on the Corpus for English-Korean Machine Translation], PhD thesis, Seoul National University.

  • Li, Wei-Chuan, Tzusheng Pei, Bing-Huang Lee and Chuei-Feng Chiou: 1990. ‘Parsing Long English Sentences with Pattern Rules’, COLING-90: Papers presented to the 13th International Conference on Computational Linguistics, Helsinki, Vol. 3, pp. 410–412.

    Google Scholar 

  • Lyon, Caroline and Bob Dickerson: 1995. ‘A Fast Partial Parse of Natural Language Sentences Using a Connectionist Method’ Seventh Conference of the European Chapter of the Association for Computational Linguistics, Dublin, Ireland, pp. 215–222.

  • Lyon, Caroline and Bob Dickerson: 1997. ‘Reducing the Complexity of Parsing by a Method of Decomposition’, International Workshop on Parsing Technology, Boston, pp. 215–222.

  • Mitchell, Tom M.: 1977. Version Spaces: An Approach to Concept Learning. PhD thesis, Stanford University.

  • Mitchell, Tom M.: 1982. ‘Generalization as Search’, Artificial Intelligence 18, 20–51.

    Google Scholar 

  • Mitchell, Tom M.: 1997. Machine Learning. New York: McGraw Hill.

    Google Scholar 

  • Nasukawa, Tetsuya: 1995. ‘Robust Parsing Based on Discourse Information: Completing partial parses of ill-formed sentences on the basis of discourse information’, 33rd Annual Meeting of the Association for Computational Linguistics, Cambridge, Mass., pp. 39–46.

  • Palmer, David D. and Marti A. Hearst: 1997. ‘Adaptive Multilingual Sentence Boundary Disambiguation’, Computational Linguistics 23, 241–265.

    Google Scholar 

  • Passonneau, Rebecca J. and Diane J. Litman: 1997. ‘Discourse Segmentation by Human and Automated Means’, Computational Linguistics 23, 103–139.

    Google Scholar 

  • Quinlan, J. R.: 1986. ‘Induction of Decision Trees’, Machine Learning 1, 81–106.

    Google Scholar 

  • Quinlan, J. R: 1993. C4.5: Programs for Machine Learning. San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  • Reynar, Jeffrey C. and Adwait Ratnaparkhi: 1997. ‘A Maximum Entropy Approach to Identifying Sentence Boundaries’, Fifth Conference on Applied Natural Language Processing, Washington, DC, pp. 16–19.

  • Schwefel, Hans-Paul: 1995. Evolution and Optimum Seeking. New York: Wiley.

    Google Scholar 

  • Tomita, Masaru: 1986. Efficient Parsing for Natural Language, Dordrecht, Kluwer Academic Publishers.

    Google Scholar 

  • Yoon, Sung Hee: 1994. ‘Efficient Parser to Find Bilingual Idiomatic Expressions for English-Korean Machine Translation’, ICCPOL '94: International Conference on Computer Processing of Oriental Languages, Taejon, Korea, pp. 455–460.

  • Zhang, Byoung-Tak and Yung-Taek Kim: 1990. ‘Morphological Analysis and Synthesis by Automated Discovery and Aquisition of Linguistic Rules’, in COLING-90: Papers presented to the 13th International Conference on Computational Linguistics, Helsinki, Vol. 2, pp. 431–436.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kim, SD., Zhang, BT. & Kim, Y.T. Learning-based Intrasentence Segmentation for Efficient Translation of Long Sentences. Machine Translation 16, 151–174 (2001). https://doi.org/10.1023/A:1019896420277

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1019896420277

Navigation