Optimization in Extractive Summarization Processes Through Automatic Classification

Garrido, Angel Luis; Bobed, Carlos; Cardiel, Oscar; Aleyxendri, Andrea; Quilez, Ruben

doi:10.1007/978-3-319-77116-8_38

Angel Luis Garrido¹⁴,
Carlos Bobed^14,15,
Oscar Cardiel¹⁶,
Andrea Aleyxendri¹⁶ &
…
Ruben Quilez¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10762))

Included in the following conference series:

International Conference on Computational Linguistics and Intelligent Text Processing

1072 Accesses
1 Citations

Abstract

The results of an extractive automatic summarization task depends to a great extend on the nature of the processed texts (e.g., news, medicine, or literature). In fact, general-purpose methods usually need to be adhoc modified to improve their performance when dealing with a particular application context. However, this customization requires a lot of effort from domain experts and application developers, which makes it not always possible nor appropriate. In this paper, we propose a multi-language approach to extractive summarization which adapts itself to different text domains in order to improve its performance. In a training step, our approach leverages the features of the text documents in order to classify them by using machine learning techniques. Then, once the text typology of each text is identified, it tunes the different parameters of the extraction mechanism solving an optimization problem for each of the text document classes. This classifier along with the learned optimizations associated with each document class allows our system to adapt to each of the input texts automatically. The proposed method has been applied in a real environment of a media company with promising results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
In fact, they belong to ValF family of functions as well, but for the sake’s of readability we have decided to change their name.
2.
We are aware we could get rid of the baseline term, but it is useful for the sake of comparing our approach with generic approaches.
3.
http://www.heraldo.es.
4.
Stop words are common words without relevant information (e.g. articles or conjunctions).
5.
A lemma is the canonical form of a word. For example, in English, sing, sings, sang, sung, and singing are different forms of the same verb, with “sing” as their common lemma.
6.
http://swesum.nada.kth.se/index-eng.html.
7.
https://www.tools4noobs.com/summarize/.
8.
http://autosummarizer.com/.
9.
http://textsummarization.net/.

References

Brandow, R., Mitze, K., Rau, L.F.: Automatic condensation of electronic publications by sentence selection. Inf. Process. Manag. 31(5), 675–685 (1995)
Article Google Scholar
Liu, Y., Li, S., Cao, Y., Lin, C.-Y., Han, D., Yu, Y.: Understanding and summarizing answers in community-based question answering services. In: Proceedings of the 22nd International Conference on Computational Linguistics (COLING 2008), pp. 497–504. Association for Computational Linguistics (2008)
Google Scholar
Padhy, N., Mishra, P., Panigrahi, R.: The survey of data mining applications and feature scope. Int. J. Comput. Sci. Eng. Inf. Technol. 2(3), 43–58 (2012)
Google Scholar
Gupta, V., Lehal, G.S.: A survey of text summarization extractive techniques. J. Emerg. Technol. Web Intell. 2(3), 258–268 (2010)
Google Scholar
Lin, C.-Y., Hovy, E.: Automatic evaluation of summaries using n-gram co-occurrence statistics. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (NAACL 2003), pp. 71–78. Association for Computational Linguistics (2003)
Google Scholar
Lal, P., Ruger, S.: Extract-based summarization with simplification. In: Proceedings of the 2002 Workshop on Text Summarization (DUC 2002), pp. 1–8, NIST (2002)
Google Scholar
Li, W., Wu, M., Lu, Q., Xu, W., Yuan, C.: Extractive summarization using inter-and intra-event relevance. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics (COLING ACL 2006), pp. 369–376. Association for Computational Linguistics (2006)
Google Scholar
Nenkova, A., McKeown, K.: A survey of text summarization techniques. In: Mining Text Data, pp. 43–76. Springer, Boston (2012)
Chapter Google Scholar
Luhn, H.P.: The automatic creation of literature abstracts. IBM J. Res. Dev. 2(2), 159–165 (1958)
Article MathSciNet Google Scholar
Edmundson, H.P.: New methods in automatic extracting. J. ACM (JACM) 16(2), 264–285 (1969)
Article Google Scholar
Kupiec, J., Pedersen, J., Chen, F.: A trainable document summarizer. In: Proceedings of the 18th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1995), pp. 68–73. ACM (1995)
Google Scholar
Lin, C.-Y.: Training a selection function for extraction. In: Proceedings of the 8th International Conference on Information and Knowledge Management (CIKM 1999), pp. 55–62. ACM (1999)
Google Scholar
Conroy, J.M., O’leary, D.P.: Text summarization via hidden Markov models. In: Proceedings of the 24th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2001), pp. 406–407. ACM (2001)
Google Scholar
Osborne, M.: Using maximum entropy for sentence extraction. In: Proceedings of the ACL-02 Workshop on Automatic Summarization (AS 2002), pp. 1–8. Association for Computational Linguistics (2002)
Google Scholar
Svore, K.M., Vanderwende, L., Burges, C.J.: Enhancing single-document summarization by combining ranknet and third-party sources. In: Proceedings of the 2007 Joing Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL 2007), pp. 448–457. Association for Computational Linguistics (2007)
Google Scholar
Ferreira, R., Freitas, F., de Souza Cabral, L., Lins, R.D., Lima, R., Franca, G., Simske, S.J., Favaro, L.: A context based text summarization system. In: Proceedings of the 11th IAPR International Workshop on Document Analysis Systems (DAS 2014), pp. 66–70. IEEE Xplore (2014)
Google Scholar
Chang, Y., Wang, X., Mei, Q., Liu, Y.: Towards Twitter context summarization with user influence models. In: Proceedings of the 6th ACM International Conference on Web Search and Data Mining (WSDM 2013), pp. 527–536. ACM (2013)
Google Scholar
Hwang, C.-L., Yoon, K.: Multiple attribute decision making: methods and applications a state-of-the-art survey, vol. 186. Springer Science & Business Media (2012)
Google Scholar
Bond, F.F.: An Introduction to Journalism: A Survey of the Fourth Estate in all its Forms. Macmillan, New York (1954)
Google Scholar
MacQuail, D.: Mass Communication Theory: An Introduction. Sage Publications, London (1983)
Google Scholar
Wolny-Zmorzyński, K., Kozieł, A.: Journalistic genology. Media Stud. 54, 1–16 (2013)
Google Scholar
Bell, A.: The discourse structure of news stories. In: Approaches to Media Discourse, pp. 64–104 (1998)
Google Scholar
Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0026683
Chapter Google Scholar
Shin, K.-S., Lee, T.S., Jung Kim, H.: An application of support vector machines in bankruptcy prediction model. Expert. Syst. Appl. 28(1), 127–135 (2005)
Article Google Scholar
Garrido, A.L., Gomez, O., Ilarri, S., Mena, E.: NASS: news annotation semantic system. In: Proceedings of the 23rd IEEE International Conference on Tools with Artificial Intelligence (ICTAI 2011), pp. 904–905. IEEE (2011)
Google Scholar
Garrido, A.L., Gómez, O., Ilarri, S., Mena, E.: An experience developing a semantic annotation system in a media group. In: Bouma, G., Ittoo, A., Métais, E., Wortmann, H. (eds.) NLDB 2012. LNCS, vol. 7337, pp. 333–338. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31178-9_43
Chapter Google Scholar
Garrido, A.L., Buey, M.G., Ilarri, S., Mena, E.: GEO-NASS: a semantic tagging experience from geographical data on the media. In: Catania, B., Guerrini, G., Pokorný, J. (eds.) ADBIS 2013. LNCS, vol. 8133, pp. 56–69. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40683-6_5
Chapter Google Scholar
Garrido, A.L., Buey, M.G., Escudero, S., Peiro, A., Ilarri, S., Mena, E.: The GENIE project-a semantic pipeline for automatic document categorisation. In: Proceedings of the 10th International Conference on Web Information Systems and Technologies (WEBIST 2014), pp. 161–171, SCITEPRESS (2014)
Google Scholar
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 24(5), 513–523 (1988)
Article Google Scholar
Garrido, A.L., Peiro, A., Ilarri, S.: Hypatia: An expert system proposal for documentation departments. In: Proceedings of the 12th International Symposium on Intelligent Systems and Informatics (SISY 2014), pp. 315–320. IEEE (2014)
Google Scholar
Garrido, A.L., Ilarri, S., Sangiao, S., Gañan, A., Bean, A., Cardiel, O.: NEREA: named entity recognition and disambiguation exploiting local document repositories. In: Proceedings of the 28th IEEE International Conference on Tools with Artificial Intelligence (ICTAI 2016), pp. 1035–1042. IEEE (2016)
Google Scholar

Download references

Acknowledgments

This research work has been supported by the CICYT project TIN2013-46238-C4-4-R, TIN2016-78011-C4-3-R (AEI/FEDER, UE), and DGA/FEDER. We want to thank Grupo Heraldo for their collaboration, and specially to Domingo Tardos and Susana Sangiao.

Author information

Authors and Affiliations

Department of Computer Science and Systems Engineering, University of Zaragoza, Zaragoza, Spain
Angel Luis Garrido & Carlos Bobed
Aragon Institute of Engineering Research (I3A), Zaragoza, Spain
Carlos Bobed
IT Department, Grupo Heraldo, Zaragoza, Spain
Oscar Cardiel, Andrea Aleyxendri & Ruben Quilez

Authors

Angel Luis Garrido
View author publications
You can also search for this author in PubMed Google Scholar
Carlos Bobed
View author publications
You can also search for this author in PubMed Google Scholar
Oscar Cardiel
View author publications
You can also search for this author in PubMed Google Scholar
Andrea Aleyxendri
View author publications
You can also search for this author in PubMed Google Scholar
Ruben Quilez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Angel Luis Garrido .

Editor information

Editors and Affiliations

CIC, Instituto Politécnico Nacional, Mexico City, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Garrido, A.L., Bobed, C., Cardiel, O., Aleyxendri, A., Quilez, R. (2018). Optimization in Extractive Summarization Processes Through Automatic Classification. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2017. Lecture Notes in Computer Science(), vol 10762. Springer, Cham. https://doi.org/10.1007/978-3-319-77116-8_38

Download citation

DOI: https://doi.org/10.1007/978-3-319-77116-8_38
Published: 10 October 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-77115-1
Online ISBN: 978-3-319-77116-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics