Skip to main content

Annotating Signs of Syntactic Complexity to Support Sentence Simplification

  • Conference paper
  • 2437 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8082))

Abstract

This article presents a new annotation scheme for syntactic complexity in text which has the advantage over other existing syntactic annotation schemes that it is easy to apply, is reliable and it is able to encode a wide range of phenomena. It is based on the notion that the syntactic complexity of sentences is explicitly indicated by signs such as conjunctions, complementisers and punctuation marks. The article describes the annotation scheme developed to annotate these signs and evaluates three corpora containing texts from three genres that were annotated using it. Inter-annotator agreement calculated on the three corpora shows that there is at least “substantial agreement” and motivates directions for future work.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agarwal, R., Boggess, L.: A simple but useful approach to conjunct identification. In: Proceedings of the 30th Annual Meeting for Computational Linguistics, Newark, Delaware, pp. 15–21. Association for Computational Linguistics (1992)

    Google Scholar 

  2. Rindflesch, T.C., Rajan, J.V., Hunter, L.: Extracting molecular binding relationships from biomedical text. In: Proceedings of the Sixth Conference on Applied Natural Language Processing, Seattle, Washington, pp. 188–195. Association of Computational Linguistics (2000)

    Google Scholar 

  3. Evans, R.: Comparing methods for the syntactic simplification of sentences in information extraction. Literary and Linguistic Computing 26 (4), 371–388 (2011)

    Article  Google Scholar 

  4. Gerber, L., Hovy, E.: Improving translation quality by manipulating sentence length. In: Farwell, D., Gerber, L., Hovy, E. (eds.) AMTA 1998. LNCS (LNAI), vol. 1529, pp. 448–460. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  5. Tomita, M.: Efficient Parsing for Natural Language: A Fast Algorithm for Practical Systems. Kluwer Academic Publishers, Norwell (1985)

    Google Scholar 

  6. McDonald, R.T., Nivre, J.: Analyzing and integrating dependency parsers. Computational Linguistics 37, 197–230 (2011)

    Article  Google Scholar 

  7. Quirk, R., Greenbaum, S., Leech, G., Svartvik, J.: A comprehensive grammar of the English language. Longman (1985)

    Google Scholar 

  8. Orăsan, C., Evans, R., Dornescu, I.: Towards multilingual Europe 2020: A Romanian perspective, pp. 287–312. Romanian Academy Publishing House (2013)

    Google Scholar 

  9. Nunberg, G., Briscoe, T., Huddleston, R.: Punctuation, pp. 1724–1764. Cambridge University Press (2002)

    Google Scholar 

  10. Brants, S., Dipper, S., Hansen, S., Lezius, W., Smith, G.: The TIGER treebank. In: Proceedings of the Workshop on Treebanks and Linguistic Theories, Sozopol (2002)

    Google Scholar 

  11. Simov, K., Popova, G., Osenova, P.: HPSG-based syntactic treebank of Bulgarian (BulTreeBank), pp. 135–142. Lincom-Europa, Munich (2002)

    Google Scholar 

  12. Hajič, J., Zemánek, P.: Prague arabic dependency treebank: Development in data and tools. In: Proceedings of the NEMLAR International Conference on Arabic Language Resources and Tools, pp. 110–117 (2004)

    Google Scholar 

  13. Marcus, M.P., Santorini, B., Marcinkiewicz, M.A.: Building a large annotated corpus of english: The penn treebank. Computational Linguistics 19, 313–330 (1993)

    Google Scholar 

  14. Charniak, E., Johnson, M.: Coarse-to-fine n-best parsing and MaxEnt discriminative reranking. In: Proceedings of the 43rd Annual Meeting of the ACL, Ann Arbor, pp. 173–180 (2005)

    Google Scholar 

  15. Collins, M., Koo, T.: Discriminative reranking for natural language parsing. Computational Linguistics 31, 25–69 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  16. Maier, W., Kübler, S., Hinrichs, E., Kriwanek, J.: Annotating coordination in the penn treebank. In: Proceedings of the Sixth Linguistic Annotation Workshop, Jeju, Republic of Korea, pp. 166–174. Association for Computational Linguistics (2012)

    Google Scholar 

  17. Ratnaparkhi, A., Roukos, S., Ward, R.T.: A maximum entropy model for parsing. In: Proceedings of the International Conference on Spoken Language Processing (ICSLP), Yokohama, Japan, pp. 803–806 (1994)

    Google Scholar 

  18. Rus, V., Moldovan, D., Bolohan, O.: FLAIRS Conference. AAAI Press (2002)

    Google Scholar 

  19. Kim, M.Y., Lee, J.H.: S-clause segmentation for efficient syntactic analysis using decision trees. In: Proceedings of the Australasian Language Technology Workshop, Melbourne, Australia (2003)

    Google Scholar 

  20. Nakov, P., Hearst, M.: Using the web as an implicit training set: Application to structural ambiguity resolution. In: Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP), Vancouver, Association for Computational Linguistics, pp. 835–842 (2005)

    Google Scholar 

  21. Hogan, D.: Coordinate noun phrase disambiguation in a generative parsing model. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, Prague, Czech Republic, pp. 680–687. Association for Computational Linguistics (2007)

    Google Scholar 

  22. Kawahara, D., Kurohashi, S.: Coordination disambiguation without any similarities. In: Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), Manchester, England, pp. 425–432 (2008)

    Google Scholar 

  23. Kübler, S., Hinrichs, E., Maier, W., Klett, E.: Parsing coordinations. In: Proceedings of the 12th Conference of the European Chapter of the ACL, Athens, Greece, pp. 406–414. Association for Computational Linguistics (2009)

    Google Scholar 

  24. Chomsky, N.: Knowledge of language: its nature, origin, and use. Greenwood Publishing Group, Santa Barbara (1986)

    Google Scholar 

  25. Viera, A.J., Garrett, J.M.: Understanding interobserver agreement: The kappa statistic. Family Medicine 37, 360–363 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Evans, R., Orăsan, C. (2013). Annotating Signs of Syntactic Complexity to Support Sentence Simplification. In: Habernal, I., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2013. Lecture Notes in Computer Science(), vol 8082. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40585-3_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-40585-3_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-40584-6

  • Online ISBN: 978-3-642-40585-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics