Sentence Compression as a Supervised Learning with a Rich Feature Space

Churkin, Elena; Last, Mark; Litvak, Marina; Vanetik, Natalia

doi:10.1007/978-3-031-23804-8_21

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13397))

Included in the following conference series:

International Conference on Computational Linguistics and Intelligent Text Processing

269 Accesses

Abstract

We present a novel supervised approach to sentence compression, based on classification and removal of word sequences generated from subtrees of the original sentence dependency tree. Our system may use any known classifier like Support Vector Machines or Logistic Model Tree to identify word sequences that can be removed without compromising the grammatical correctness of the compressed sentence. We trained our system using several classifiers on a small annotated dataset of 100 sentences, which included around 1500 manually labeled subtrees (removal candidates) represented by 25 features. The highest cross-validation classification accuracy of 80% was obtained with the SMO (Normalized Poly Kernel) algorithm. We evaluated the readability and the informativeness of the sentences compressed by the SMO-based classification model with the help of human raters using a separate benchmark dataset of 200 sentences.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Single-Sentence Compression Using SVM

Investigating Machine Learning Approaches for Sentence Compression in Different Application Contexts for Portuguese

Sentence Compression for Portuguese

Notes

1.
Complete sentences and single words that are prepositions, wh-words, pronouns, and forms of the “to be” verb are not saved in the list of removal candidates.
2.
All syntax features are calculated using the dependency syntax tree, except for the last one, is obtained from the constituency parse tree.
3.
We use “LEAF”type for a single word.
4.
We use “null” for the non-NEs.
5.
http://www-nlpir.nist.gov/projects/duc/data.html.

References

Berg-Kirkpatrick, T., Gillick, D., Klein, D.: Jointly learning to extract and compress. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 481–490. Association for Computational Linguistics (2011)
Google Scholar
Clarke, J., Lapata, M.: Global inference for sentence compression: an integer linear programming approach. J. Artif. Intell. Res. 31, 399–429 (2008)
Article Google Scholar
Cohn, T., Lapata, M.: Large margin synchronous generation and its application to sentence compression. In: EMNLP-CoNLL, pp. 73–82 (2007)
Google Scholar
Cohn, T., Lapata, M.: Sentence compression beyond word deletion. In: Proceedings of the 22nd International Conference on Computational Linguistics, vol. 1, pp. 137–144. Association for Computational Linguistics (2008)
Google Scholar
Colmenares, C.A., Litvak, M., Mantrach, A., Silvestri, F.: HEADS: headline generation as sequence prediction using an abstract feature-rich space. In: North American Chapter of the Association for Computational Linguistics - Human Language Technologies (NAACL HLT 2015), pp. 133–142 (2015)
Google Scholar
Corston-Oliver, S.: Text compaction for display on very small screens. In: Proceedings of the NAACL Workshop on Automatic Summarization, pp. 89–98. Citeseer (2001)
Google Scholar
Filippova, K., Alfonseca, E., Colmenares, C.A., Kaiser, L., Vinyals, O.: Sentence compression by deletion with LSTMs. In: EMNLP, pp. 360–368 (2015)
Google Scholar
Filippova, K., Strube, M.: Dependency tree based sentence compression. In: Proceedings of the Fifth International Natural Language Generation Conference, pp. 25–32. Association for Computational Linguistics (2008)
Google Scholar
Frank, E., Hall, M.A., Witten, I.H.: The WEKA workbench. online appendix for “data mining: Practical machine learning tools and techniques", 4th edn. Morgan Kaufmann (2016)
Google Scholar
Galley, M., McKeown, K.: Lexicalized markov grammars for sentence compression. In: HLT-NAACL, pp. 180–187 (2007)
Google Scholar
Jing, H.: Sentence reduction for automatic text summarization. In: Proceedings of the Sixth Conference on Applied Natural Language Processing, pp. 310–315. Association for Computational Linguistics (2000)
Google Scholar
Knight, K., Marcu, D.: Statistics-based summarization-step one: sentence compression. In: AAAI/IAAI, vol. 2000, pp. 703–710 (2000)
Google Scholar
Knight, K., Marcu, D.: Summarization beyond sentence extraction: a probabilistic approach to sentence compression. Artif. Intell. 139(1), 91–107 (2002)
Article Google Scholar
Lin, C.Y.: Improving summarization performance by sentence compression: a pilot study. In: Proceedings of the Sixth International Workshop on Information Retrieval with Asian Languages, vol. 11, pp. 1–8. Association for Computational Linguistics (2003)
Google Scholar
Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J.R., Bethard, S., McClosky, D.: The stanford corenlp natural language processing toolkit. In: ACL (System Demonstrations), pp. 55–60 (2014)
Google Scholar
McDonald, R.T.: Discriminative sentence compression with soft syntactic evidence. In: EACL (2006)
Google Scholar
Sidhaye, P., Cheung, J.C.K.: Indicative tweet generation: an extractive summarization problem? In: EMNLP, pp. 138–147 (2015)
Google Scholar
Vandeghinste, V., Pan, Y.: Sentence compression for automated subtitling: a hybrid approach. In: Proceedings of the ACL workshop on Text Summarization, pp. 89–95 (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Software Engineering, Shamoon Academic College, Beer Sheva, Israel
Elena Churkin, Marina Litvak & Natalia Vanetik
Department of Software and Information Systems Engineering, Ben Gurion University of the Negev, Beer Sheva, Israel
Mark Last

Authors

Elena Churkin
View author publications
You can also search for this author in PubMed Google Scholar
Mark Last
View author publications
You can also search for this author in PubMed Google Scholar
Marina Litvak
View author publications
You can also search for this author in PubMed Google Scholar
Natalia Vanetik
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Natalia Vanetik .

Editor information

Editors and Affiliations

Instituto Politécnico Nacional, Mexico City, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Churkin, E., Last, M., Litvak, M., Vanetik, N. (2023). Sentence Compression as a Supervised Learning with a Rich Feature Space. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2018. Lecture Notes in Computer Science, vol 13397. Springer, Cham. https://doi.org/10.1007/978-3-031-23804-8_21

Download citation

DOI: https://doi.org/10.1007/978-3-031-23804-8_21
Published: 26 February 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-23803-1
Online ISBN: 978-3-031-23804-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Sentence Compression as a Supervised Learning with a Rich Feature Space

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Single-Sentence Compression Using SVM

Investigating Machine Learning Approaches for Sentence Compression in Different Application Contexts for Portuguese

Sentence Compression for Portuguese

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Sentence Compression as a Supervised Learning with a Rich Feature Space

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Single-Sentence Compression Using SVM

Investigating Machine Learning Approaches for Sentence Compression in Different Application Contexts for Portuguese

Sentence Compression for Portuguese

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation