Effect of Preprocessing on Extractive Summarization with Maximal Frequent Sequences

Ledeneva, Yulia

doi:10.1007/978-3-540-88636-5_11

Effect of Preprocessing on Extractive Summarization with Maximal Frequent Sequences

Yulia Ledeneva³

Conference paper

2082 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5317))

Abstract

The task of extractive summarization consists in producing a text summary by extracting a subset of text segments, such as sentences, and concatenating them to form a summary of the original text. The selection of sentences is based on terms they contain, which can be single words or multiword expressions. In a previous work, we have suggested so-called Maximal Frequent Sequences as such terms. In this paper, we investigate the effect of preprocessing on the process of selecting such sequences. Our results suggest that the accuracy of the method is, contrary to expectations, not seriously affected by preprocessing—which is both bad and good news, as we show.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ledeneva, Y., Gelbukh, A., García-Hernández, R.: Terms Derived from Frequent Sequences for Extractive Text Summarization. In: Gelbukh, A. (ed.) CICLing 2008. LNCS, vol. 4919, pp. 593–604. Springer, Heidelberg (2008)
Chapter Google Scholar
Ledeneva, Y., Gelbukh, A., García-Hernández, R.: Keeping Maximal Frequent Sequences Facilitates Extractive Summarization. In: Sidorov, G., et al. (eds.) Advances in Computer Science and Engineering, 9th Conference on Computing (CORE-2008), Research in Computing Science, vol. 34, pp. 163–174 (2008) ISSN: 1870-4069
Google Scholar
Pomikálek, J., Rehurek, R.: The Influence of preprocessing parameters on text categorization. In: Proc. of World Academy of Science, Engineering and Technology, vol. 21, pp. 430–434 (2007)
Google Scholar
Abu-Salem, H., Al-Omari, M., Evens, M.W.: Stemming methodologies over individual words for an Arabic Information Retrieval System. Journal of the American Society for Information Science 50, 524–529 (1999)
Article Google Scholar
Larkey, L.S., Ballesteros, L., Connell, M.: Improving Stemming for Arabic Information Retrieval: Light Stemming and Co-occurrence Analysis. In: Proc. of ACM SIGID Conference in IR, pp. 275–282 (2002)
Google Scholar
Halácsy, P., Trón, V.: Benefits of Resource-Based Stemming in Hungarian Information Retrieval. In: Peters, C., Clough, P., Gey, F.C., Karlgren, J., Magnini, B., Oard, D.W., de Rijke, M., Stempfhuber, M. (eds.) CLEF 2006. LNCS, vol. 4730, pp. 99–106. Springer, Heidelberg (2007)
Chapter Google Scholar
Hamzah, M.P., Tengku Sembok, M.: On Retrieval Performance of Malay Textual Documents. In: Proc. of IASTED, pp. 156–161. ACTA Press (2006)
Google Scholar
Frakes, W., Baeza-Yates, R.: Information Retrieval: Data Structures and Algorithms. Prentice Hall, Englewood Cliffs (1992)
Google Scholar
Villatoro-Tello, E., Villaseñor-Pineda, L., Montes-y-Gómez, M.: Using Word Sequences for Text Summarization. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2006. LNCS (LNAI), vol. 4188, pp. 293–300. Springer, Heidelberg (2006)
Chapter Google Scholar
Liu, D., et al.: Multi-Document Summarization Based on BE-Vector Clustering. In: Gelbukh, A. (ed.) CICLing 2006. LNCS, vol. 3878, pp. 470–479. Springer, Heidelberg (2006)
Chapter Google Scholar
Bolshakov, I.A.: Getting One’s First Million...Collocations. In: Gelbukh, A. (ed.) CICLing 2004. LNCS, vol. 2945, pp. 229–242. Springer, Heidelberg (2004)
Chapter Google Scholar
Sidorov, G., Gelbukh, A.: Automatic Detection of Semantically Primitive Words Using Their Reachability in an Explanatory Dictionary. In: IEEE International Workshop on Natural Language Processing and Knowledge Engineering, NLPKE 2001 at Proc. International IEEE SMC-2001 Conference: Systems, Man, And Cybernetics, USA, pp. 1683–1687 (2001) ISBN 0-7803-7087-2
Google Scholar
Song, Y., et al.: A Term Weighting Method based on Lexical Chain for Automatic Summarization. In: Gelbukh, A. (ed.) CICLing 2004. LNCS, vol. 2945, pp. 636–639. Springer, Heidelberg (2004)
Chapter Google Scholar
Mihalcea, R.: Random Walks on Text Structures. In: Gelbukh, A. (ed.) CICLing 2006. LNCS, vol. 3878, pp. 249–262. Springer, Heidelberg (2006)
Chapter Google Scholar
Mihalcea, R., Tarau, P.: TextRank: Bringing Order into Texts. In: Proc. of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2004), Barcelona, Spain (2004)
Google Scholar
Baeza-Yates, R.: Modern Information Retrieval. Addison Wesley/Longman Publishing Co. (1999)
Google Scholar
Frakes, W., Baeza-Yates, R.: Information Retrieval: Data Structures and Algorithms. Prentice-Hall, Englewood Cliffs (1992)
Google Scholar
Sparck Jones, K., Willet, P.: Readings in Information Retrieval. Morgan Kaufmann, San Francisco (1997)
Google Scholar
García-Hernández, R.A., Martínez-Trinidad, J.F., Carrasco-Ochoa, J.A.: A Fast Algorithm to Find All the Maximal Frequent Sequences in a Text. In: Sanfeliu, A., Martínez Trinidad, J.F., Carrasco Ochoa, J.A. (eds.) CIARP 2004. LNCS, vol. 3287, pp. 478–486. Springer, Heidelberg (2004)
Chapter Google Scholar
García-Hernández, R.A., Martínez-Trinidad, J.F., Carrasco-Ochoa, J.A.: A New Algorithm for Fast Discovery of Maximal Sequential Patterns in a Document Collection. In: Gelbukh, A. (ed.) CICLing 2006. LNCS, vol. 3878, pp. 514–523. Springer, Heidelberg (2006)
Chapter Google Scholar
DUC. Document understanding conference 2002 (2002), www-nlpir.nist.gov/projects/duc
Lin, C.Y.: ROUGE: A Package for Automatic Evaluation of Summaries. In: Proc. of Workshop on Text Summarization of ACL, Spain (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Center for Computing Research, National Polytechnic Institute, Av. Juan de Dios Bátiz s/n, D.F., 07738, Mexico
Yulia Ledeneva

Authors

Yulia Ledeneva
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Computing Research, National Polytechnic Institute, 07738, Mexico City, México
Alexander Gelbukh
Ciencias Computacionales, Instituto Nacional de Astrofísica, Óptica y Electrónica (INAOE), Luis Enrique Erro #1 , Sta. María Tonantzintla, 72840, Puebla, México
Eduardo F. Morales

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ledeneva, Y. (2008). Effect of Preprocessing on Extractive Summarization with Maximal Frequent Sequences. In: Gelbukh, A., Morales, E.F. (eds) MICAI 2008: Advances in Artificial Intelligence. MICAI 2008. Lecture Notes in Computer Science(), vol 5317. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88636-5_11

Download citation

DOI: https://doi.org/10.1007/978-3-540-88636-5_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-88635-8
Online ISBN: 978-3-540-88636-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics