Skip to main content

Sentence Compression as a Supervised Learning with a Rich Feature Space

  • Conference paper
  • First Online:
Computational Linguistics and Intelligent Text Processing (CICLing 2018)

Abstract

We present a novel supervised approach to sentence compression, based on classification and removal of word sequences generated from subtrees of the original sentence dependency tree. Our system may use any known classifier like Support Vector Machines or Logistic Model Tree to identify word sequences that can be removed without compromising the grammatical correctness of the compressed sentence. We trained our system using several classifiers on a small annotated dataset of 100 sentences, which included around 1500 manually labeled subtrees (removal candidates) represented by 25 features. The highest cross-validation classification accuracy of 80% was obtained with the SMO (Normalized Poly Kernel) algorithm. We evaluated the readability and the informativeness of the sentences compressed by the SMO-based classification model with the help of human raters using a separate benchmark dataset of 200 sentences.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Complete sentences and single words that are prepositions, wh-words, pronouns, and forms of the “to be” verb are not saved in the list of removal candidates.

  2. 2.

    All syntax features are calculated using the dependency syntax tree, except for the last one, is obtained from the constituency parse tree.

  3. 3.

    We use “LEAF”type for a single word.

  4. 4.

    We use “null” for the non-NEs.

  5. 5.

    http://www-nlpir.nist.gov/projects/duc/data.html.

References

  1. Berg-Kirkpatrick, T., Gillick, D., Klein, D.: Jointly learning to extract and compress. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 481–490. Association for Computational Linguistics (2011)

    Google Scholar 

  2. Clarke, J., Lapata, M.: Global inference for sentence compression: an integer linear programming approach. J. Artif. Intell. Res. 31, 399–429 (2008)

    Article  Google Scholar 

  3. Cohn, T., Lapata, M.: Large margin synchronous generation and its application to sentence compression. In: EMNLP-CoNLL, pp. 73–82 (2007)

    Google Scholar 

  4. Cohn, T., Lapata, M.: Sentence compression beyond word deletion. In: Proceedings of the 22nd International Conference on Computational Linguistics, vol. 1, pp. 137–144. Association for Computational Linguistics (2008)

    Google Scholar 

  5. Colmenares, C.A., Litvak, M., Mantrach, A., Silvestri, F.: HEADS: headline generation as sequence prediction using an abstract feature-rich space. In: North American Chapter of the Association for Computational Linguistics - Human Language Technologies (NAACL HLT 2015), pp. 133–142 (2015)

    Google Scholar 

  6. Corston-Oliver, S.: Text compaction for display on very small screens. In: Proceedings of the NAACL Workshop on Automatic Summarization, pp. 89–98. Citeseer (2001)

    Google Scholar 

  7. Filippova, K., Alfonseca, E., Colmenares, C.A., Kaiser, L., Vinyals, O.: Sentence compression by deletion with LSTMs. In: EMNLP, pp. 360–368 (2015)

    Google Scholar 

  8. Filippova, K., Strube, M.: Dependency tree based sentence compression. In: Proceedings of the Fifth International Natural Language Generation Conference, pp. 25–32. Association for Computational Linguistics (2008)

    Google Scholar 

  9. Frank, E., Hall, M.A., Witten, I.H.: The WEKA workbench. online appendix for “data mining: Practical machine learning tools and techniques", 4th edn. Morgan Kaufmann (2016)

    Google Scholar 

  10. Galley, M., McKeown, K.: Lexicalized markov grammars for sentence compression. In: HLT-NAACL, pp. 180–187 (2007)

    Google Scholar 

  11. Jing, H.: Sentence reduction for automatic text summarization. In: Proceedings of the Sixth Conference on Applied Natural Language Processing, pp. 310–315. Association for Computational Linguistics (2000)

    Google Scholar 

  12. Knight, K., Marcu, D.: Statistics-based summarization-step one: sentence compression. In: AAAI/IAAI, vol. 2000, pp. 703–710 (2000)

    Google Scholar 

  13. Knight, K., Marcu, D.: Summarization beyond sentence extraction: a probabilistic approach to sentence compression. Artif. Intell. 139(1), 91–107 (2002)

    Article  Google Scholar 

  14. Lin, C.Y.: Improving summarization performance by sentence compression: a pilot study. In: Proceedings of the Sixth International Workshop on Information Retrieval with Asian Languages, vol. 11, pp. 1–8. Association for Computational Linguistics (2003)

    Google Scholar 

  15. Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J.R., Bethard, S., McClosky, D.: The stanford corenlp natural language processing toolkit. In: ACL (System Demonstrations), pp. 55–60 (2014)

    Google Scholar 

  16. McDonald, R.T.: Discriminative sentence compression with soft syntactic evidence. In: EACL (2006)

    Google Scholar 

  17. Sidhaye, P., Cheung, J.C.K.: Indicative tweet generation: an extractive summarization problem? In: EMNLP, pp. 138–147 (2015)

    Google Scholar 

  18. Vandeghinste, V., Pan, Y.: Sentence compression for automated subtitling: a hybrid approach. In: Proceedings of the ACL workshop on Text Summarization, pp. 89–95 (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Natalia Vanetik .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Churkin, E., Last, M., Litvak, M., Vanetik, N. (2023). Sentence Compression as a Supervised Learning with a Rich Feature Space. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2018. Lecture Notes in Computer Science, vol 13397. Springer, Cham. https://doi.org/10.1007/978-3-031-23804-8_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-23804-8_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-23803-1

  • Online ISBN: 978-3-031-23804-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics