Skip to main content

Extractive Summarization via Overlap-Based Optimized Picking

  • Conference paper
  • First Online:
  • 1349 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10569))

Abstract

Optimization-based methods regard summarization as a combinatorial optimization problem and formulate it as weighted linear combination of criteria metrics. However due to inconsistent criteria metrics, it is hard to set proper weights. Subjectivity problem also arises since most of them summarize original texts. In this paper, we propose overlap based greedy picking (OGP) algorithm for citation-based extractive summarization. In the algorithm, overlap is defined as a sentence containing several topics. Since including overlaps into summaires indirectly impacts on salience, summary size and content redundancy, OGP effectively avoids the problem of inconsistent metric while dynamically involving criteria into optimization. Despite of greedy method, OGP proves above \((1-1/e)\) of optimal solution. Since citation context is composed of objective evaluations, OGP also solves subjectivity problem. Our experiment results show that OGP outperforms other baseline methods. And various criteria proves effectively involved under the control of single parameter \(\beta \).

This is a preview of subscription content, log in via an institution.

References

  1. Ahn, Y.Y., Bagrow, J.P., Lehmann, S.: Link communities reveal multiscale complexity in networks. Nature, pp. 761–764 (2010)

    Article  Google Scholar 

  2. Carbonell, J., Goldstein, J.: The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and development in information retrieval, pp. 335–336 (1998)

    Google Scholar 

  3. Clauset, A., Newman, M.E., Moore, C.: Finding community structure in very large networks. Phys. Rev. E 70, 1–6 (2004)

    Article  Google Scholar 

  4. Erkan, G., Radev, D.R.: Lexpagerank: Prestige in multi-document text summarization. In: Conference on Empirical Methods in Natural Language Processing, pp. 365–371 (2004)

    Google Scholar 

  5. Erkan, G., Radev, D.R.: Lexrank: Graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. 22, 457–479 (2004)

    Article  Google Scholar 

  6. Filatova, E., Hatzivassiloglou, V.: A formal model for information selection in multi-sentence text extraction. In: Proceedings of the 20th International Conference on Computational Linguistics, pp. 397–403 (2004)

    Google Scholar 

  7. Fung, P., Ngai, G., Cheung, C.S.: Combining optimal clustering and hidden Markov models for extractive summarization. In: Proceedings of the ACL 2003 Workshop on Multilingual Summarization and Question Answering, pp. 21–28 (2003)

    Google Scholar 

  8. Harabagiu, S., Lacatusu, F.: Topic themes for multi-document summarization. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 202–209 (2005)

    Google Scholar 

  9. Hardy, H., Shimizu, N., Strzalkowski, T., Ting, L., Zhang, X., Wise, G.B.: Cross-document summarization by concept classification. In: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 121–128 (2002)

    Google Scholar 

  10. Hirao, T., Yoshida, Y., Nishino, M., Yasuda, N., Nagata, M.: Single-document summarization as a tree knapsack problem. In: Conference on Empirical Methods in Natural Language Processing, pp. 1515–1520 (2013)

    Google Scholar 

  11. Jaccard, P.: Etude comparative de la distribution florale dans une portion des Alpes et du Jura. Impr. Corbaz (1901)

    Google Scholar 

  12. Kaplan, D., Iida, R., Tokunaga, T.: Automatic extraction of citation contexts for research paper summarization: a coreference-chain based approach. In: Proceedings of the 2009 Workshop on Text and Citation Analysis for Scholarly Digital Libraries, pp. 88–95 (2009)

    Google Scholar 

  13. Lin, C.Y.: Rouge: a package for automatic evaluation of summaries. In: Text Summarization Branches Out: Proceedings of the ACL-04 Workshop, pp. 74–81 (2004)

    Google Scholar 

  14. Lin, H., Bilmes, J., Xie, S.: Graph-based submodular selection for extractive summarization. In: IEEE Workshop on Automatic Speech Recognition & Understanding, ASRU 2009, pp. 381–386 (2009)

    Google Scholar 

  15. Lin, H., Bilmes, J.: Multi-document summarization via budgeted maximization of submodular functions. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 912–920 (2010)

    Google Scholar 

  16. Lin, H., Bilmes, J.: A class of submodular functions for document summarization. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 510–520 (2011)

    Google Scholar 

  17. Lin, H., Bilmes, J.: Learning mixtures of submodular shells with application to document summarization. In: Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence, pp. 479–490 (2012)

    Google Scholar 

  18. McDonald, R.: A study of global inference algorithms in multi-document summarization. In: Amati, G., Carpineto, C., Romano, G. (eds.) ECIR 2007. LNCS, vol. 4425, pp. 557–564. Springer, Heidelberg (2007). doi:10.1007/978-3-540-71496-5_51

    Chapter  Google Scholar 

  19. McKeown, K., Klavans, J., Hatzivassiloglou, V., Barzilay, R., Eskin, E.: Towards multidocument summarization by reformulation: progress and prospects. In: Sixteenth National Conference on Artificial Intelligence and the Eleventh Innovative Applications of Artificial Intelligence Conference Innovative Applications of Artificial Intelligence, pp. 453–460 (1999)

    Google Scholar 

  20. Mei, Q., Guo, J., Radev, D.: Divrank: the interplay of prestige and diversity in information networks. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1009–1018 (2010)

    Google Scholar 

  21. Mei, Q., Zhai, C.: Generating impact-based summaries for scientific literature. In: Proceedings of the Meeting of the Association for Computational Linguistics, pp. 816–824 (2008)

    Google Scholar 

  22. Mohammad, S., Dorr, B., Egan, M., Hassan, A., Muthukrishan, P., Qazvinian, V., Radev, D., Zajic, D.: Using citations to generate surveys of scientific paradigms. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 584–592 (2009)

    Google Scholar 

  23. Morita, H., Sasano, R., Takamura, H., Okumura, M.: Subtree extractive summarization via submodular maximization. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, pp. 1023–1032 (2013)

    Google Scholar 

  24. Nakov, P.I., Schwartz, A.S., Hearst, M.: Citances: citation sentences for semantic analysis of bioscience text. In: Proceedings of the SIGIR 2004 workshop on Search and Discovery in Bioinformatics, pp. 81–88 (2004)

    Google Scholar 

  25. Nanba, H., Okumura, M.: Towards multi-paper summarization using reference information. In: International Joint Conference on Artificial Intelligence, pp. 926–931 (1999)

    Article  Google Scholar 

  26. Nishikawa, H., Hirao, T., Makino, T., Matsuo, Y.: Text summarization model based on redundancy-constrained knapsack problem. In: Proceedings of COLING 2012: Posters, pp. 893–902 (2012)

    Google Scholar 

  27. Parveen, D., Mesgar, M., Strube, M.: Generating coherent summaries of scientific articles using coherence patterns. In: Conference on Empirical Methods in Natural Language Processing, pp. 772–783 (2016)

    Google Scholar 

  28. Parveen, D., Ramsl, H.M., Strube, M.: Topical coherence for graph-based extractive summarization, pp. 1949–1954 (2015)

    Google Scholar 

  29. Qazvinian, V., Radev, D.R.: Scientific paper summarization using citation summary networks. In: Proceedings of the 22nd International Conference on Computational Linguistics, pp. 689–696 (2008)

    Google Scholar 

  30. Qazvinian, V., Radev, D.R.: Identifying non-explicit citing sentences for citation-based summarization. In: Proceedings of the 48th annual meeting of the association for computational linguistics, pp. 555–564 (2010)

    Google Scholar 

  31. Qian, X., Liu, Y.: Fast joint compression and summarization via graph cuts. In: Conference on Empirical Methods in Natural Language Processing, pp. 1492–1502 (2013)

    Google Scholar 

  32. Radev, D., Allison, T., Blair-Goldensohn, S., Blitzer, J., Celebi, A., Dimitrov, S., Drabek, E., Hakim, A., Lam, W., Liu, D., et al.: Mead-a platform for multidocument multilingual text summarization (2004)

    Google Scholar 

  33. Radev, D.R., Muthukrishnan, P., Qazvinian, V., Abu-Jbara, A.: The ACL anthology network corpus. Lang. Resour. Eval. 47, 919–944 (2013)

    Article  Google Scholar 

  34. Shen, C., Li, T.: Multi-document summarization via the minimum dominating set. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 984–992 (2010)

    Google Scholar 

  35. Siddharthan, A., Nenkova, A., McKeown, K.: Syntactic simplification for improving content selection in multi-document summarization. In: Proceedings of the 20th international conference on Computational Linguistics, pp. 896–902 (2004)

    Google Scholar 

  36. Skabar, A., Abdalgader, K.: Clustering sentence-level text using a novel fuzzy relational clustering algorithm. IEEE Trans. Knowl. Data Eng. 25, 62–75 (2013)

    Article  Google Scholar 

  37. Takamura, H., Okumura, M.: Text summarization model based on maximum coverage problem and its variant. In: Conference of the European Chapter of the Association for Computational Linguistics, pp. 505–513 (2009)

    Google Scholar 

  38. Vigneshwaran, L.J.K.P.M., Sharma, M.V.V.D.M.: Non-decreasing sub-modular function for comprehensible summarization. In: Proceedings of NAACL-HLT, pp. 94–101 (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gaokun Dai .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Dai, G., Niu, Z. (2017). Extractive Summarization via Overlap-Based Optimized Picking. In: Bouguettaya, A., et al. Web Information Systems Engineering – WISE 2017. WISE 2017. Lecture Notes in Computer Science(), vol 10569. Springer, Cham. https://doi.org/10.1007/978-3-319-68783-4_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-68783-4_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-68782-7

  • Online ISBN: 978-3-319-68783-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics