Abstract
In dependency parsing of long sentences with fewer subjects than predicates, it is difficult to recognize which predicate governs which subject. To handle such syntactic ambiguity between subjects and predicates, an “S(ubject)-clause” is defined as a group of words containing several predicates and their common subject, and then an automatic S-clause segmentation method is proposed using semantic features as well as morpheme features. We also propose a new dependency tree to reflect S-clauses. Trace information is used to indicate the omitted subject of each predicate. The S-clause information turned out to be very effective in analyzing long sentences, with an improved parsing performance of 4.5%. The precision in determining the governors of subjects in dependency parsing was improved by 32%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agarwal, R., Boggess, L.: A simple but useful approach to conjunct identification. In: Proceedings of the 30th Annual Meeting of the Association for Computational Linguistics, Nantes, France, pp. 15–21 (1992)
Carreras, X., Marquez, L., Punyakanok, V., Roth, D.: Learning and Inference for Clause Identification. In: Proceedings of European Conference on Machine Learning, Helsinki, Finland, pp. 35–47 (2002)
Haruno, M., Shirai, S., Ooyama, Y.: Using Decision Trees to Construct a Practical Parser. In: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics, Monteal, Quebec, Canada, pp. 505–511 (1998)
Kim, E., Lee, J.H.: A Collocation-Based Transfer Model for Japanese-to-Korean Machine Translation. In: Proceedings of the Natural Language Processing Pacific Rim Symposium, Fukuoka, Japan, pp. 223–231 (1993)
Kim, K., Park, E., Ra, D., Yoon, J.: A method of Korean parsing based on sentence segmentation. In: Proceedings of 14th Hangul and Korean Information Processing, Chung-ju, Korea, pp. 163–168 (2002) (written in Korean)
Kurohashi, S., Nagao, M.: A syntactic analysis method of long Japanese sentences based on the detection of conjunctive structures. Computational Linguistics 20(4), 507–534 (1994)
Leffa, V.J.: Clause processing in complex sentences. In: Proceedings of the 1st International Conference on Language Resources and Evaluation, Granada, Spain, pp. 937–943 (1998)
Lombardo, V., Lesmo, L.: A formal theory of dependency syntax with non-lexical units. In: Traitement Automatique des Langues, Mel.cuk I.: Dependency syntax: theory and practice, SUNY University Press (1988)
Molina, A., Pla, F.: Clause Detection using HMM. In: Proceedings of the 5th Conference on Computational Natural Language Learning, Toulouse, France, pp. 70–72 (2001)
Nomoto, T., Matsumoto, Y.: Discourse Parsing: A Decision Tree Approach. In: Proceedings of the 6th Workshop on Very Large Corpora, Montreal, Quebec, Canada, pp. 216–224 (1998)
Palmer, D.D., Hearst, M.A.: Adaptive Multilingual Sentence Boundary Disambiguation. Computational Linguistics 27, 241–261 (1997)
Ross Quinlan, J.: C4.5 Programs for Machine Learning. Morgan Kaufmann Publishers, San Francisco (1993)
Sang, E.F.T.K., Dejean, H.: Introduction to the CoNLL 2001 Shared Task: Clause Identification. In: Proceedings of CoNLL 2001, Toulouse, France, pp. 53–57 (2001)
Sornertlamvanich, V., Potipiti, T., Charoenporn, T.: Automatic Corpus- Based Thai Word Extraction with the C4.5 Learning Algorithm. In: Proceedings of the 18th International Conference on Computational Linguistics, Saarbrucken, Germany, pp. 802–807 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kim, MY., Lee, JH. (2005). Syntactic Analysis of Long Sentences Based on S-Clauses. In: Su, KY., Tsujii, J., Lee, JH., Kwong, O.Y. (eds) Natural Language Processing – IJCNLP 2004. IJCNLP 2004. Lecture Notes in Computer Science(), vol 3248. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30211-7_55
Download citation
DOI: https://doi.org/10.1007/978-3-540-30211-7_55
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24475-2
Online ISBN: 978-3-540-30211-7
eBook Packages: Computer ScienceComputer Science (R0)