skip to main content
research-article

Textual Entailment--Based Figure Summarization for Biomedical Articles

Published:17 April 2020Publication History
Skip Abstract Section

Abstract

This article proposes a novel unsupervised approach (FigSum++) for automatic figure summarization in biomedical scientific articles using a multi-objective evolutionary algorithm. The problem is treated as an optimization problem where relevant sentences in the summary for a given figure are selected based on various sentence scoring features (or objective functions), such as the textual entailment score between sentences in the summary and a figure’s caption, the number of sentences referring to that figure, semantic similarity between sentences and a figure’s caption, and the number of overlapping words between sentences and a figure’s caption. These objective functions are optimized simultaneously using multi-objective binary differential evolution (MBDE). MBDE consists of a set of solutions, and each solution represents a subset of sentences to be selected in the summary. MBDE generally uses a single differential evolution variant, but in the current study, an ensemble of two different differential evolution variants measuring diversity among solutions and convergence toward global optimal solution, respectively, is employed for efficient search. Usually, in any summarization system, diversity among sentences (called anti-redundancy) in the summary is a very critical feature, and it is calculated in terms of similarity (like cosine similarity) among sentences. In this article, a new way of measuring diversity in terms of textual entailment is proposed. To represent the sentences of the article in the form of numeric vectors, the recently proposed BioBERT pre-trained language model in biomedical text mining is utilized. An ablation study has also been presented to determine the importance of different objective functions. For evaluation of the proposed technique, two benchmark biomedical datasets containing 91 and 84 figures are considered. Our proposed system obtains 5% and 11% improvements in terms of the F-measure metric over two datasets, compared to the state-of-the-art unsupervised methods.

Skip Supplemental Material Section

Supplemental Material

References

  1. Stergos Afantenos, Vangelis Karkaletsis, and Panagiotis Stamatopoulos. 2005. Summarization from medical documents: A survey. Artificial Intelligence in Medicine 33, 2 (2005), 157--177.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Shashank Agarwal and Hong Yu. 2009. FigSum: Automatically generating structured text summaries for figures in biomedical literature. In AMIA Annual Symposium Proceedings, Vol. 2009. American Medical Informatics Association, Bethesda, MD, 6--10.Google ScholarGoogle Scholar
  3. Rasim M. Alguliev, Ramiz M. Aliguliyev, and Nijat R. Isazade. 2012. DESAMC+ DocSum: Differential evolution with self-adaptive mutation and crossover parameters for multi-document summarization. Knowledge-Based Systems 36 (2012), 21--38.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Rasim M. Alguliyev, Ramiz M. Aliguliyev, and Nijat R. Isazade. 2013. Multiple documents summarization based on evolutionary optimization algorithm. Expert Systems with Applications 40 (2013), 1675--1689.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Sanghamitra Bandyopadhyay, Sriparna Saha, Ujjwal Maulik, and Kalyanmoy Deb. 2008. A simulated annealing-based multiobjective optimization algorithm: AMOSA. IEEE Transactions on Evolutionary Computation 12, 3 (2008), 269--283.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Sumit Bhatia and Prasenjit Mitra. 2012. Summarizing figures, tables, and algorithms in scientific publications to augment search results. ACM Transactions on Information Systems 30, 1 (2012), 3.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Swagatam Das and Ponnuthurai Nagaratnam Suganthan. 2011. Differential evolution: A survey of the state-of-the-art. IEEE Transactions on Evolutionary Computation 15, 1 (2011), 4--31.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Kalyanmoy Deb, Amrit Pratap, Sameer Agarwal, and Tamt Meyarivan. 2002. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation 6, 2 (2002), 182--197.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Daniel M. Dunlavy, Dianne P O’Leary, John M. Conroy, and Judith D. Schlesinger. 2007. QCS: A system for querying, clustering and summarizing documents. Information Processing 8 Management 43, 6 (2007), 1588--1605.Google ScholarGoogle Scholar
  10. Soumi Dutta, Vibhash Chandra, Kanav Mehra, Asit Kumar Das, Tanmoy Chakraborty, and Saptarshi Ghosh. 2018. Ensemble algorithms for microblog summarization. IEEE Intelligent Systems 33, 3 (2018), 4--14.Google ScholarGoogle ScholarCross RefCross Ref
  11. Rafael Ferreira, Luciano de Souza Cabral, Rafael Dueire Lins, Gabriel Pereira e Silva, Fred Freitas, George D. C. Cavalcanti, Rinaldo Lima, Steven J. Simske, and Luciano Favaro. 2013. Assessing sentence scoring techniques for extractive text summarization. Expert Systems with Applications 40, 14 (2013), 5755--5764.Google ScholarGoogle ScholarCross RefCross Ref
  12. Robert P. Futrelle. 1999. Summarization of diagrams in documents. In Advances in Automated Text Summarization, I. Mani and M. Maybury. MIT Press, Cambridge, MA, 403--421.Google ScholarGoogle Scholar
  13. Robert P. Futrelle. 2004. Handling figures in document summarization. In Proceedings of the Workshop at the Annual Meeting of the Association for Computational Linguistics. 61--65.Google ScholarGoogle Scholar
  14. Eugene J. Guglielmo and Neil C. Rowe. 1996. Natural-language retrieval of images based on descriptive captions. ACM Transactions on Information Systems 14, 3 (1996), 237--267.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Zhenan He and Gary G. Yen. 2016. Visualization and performance metric in many-objective optimization. IEEE Transactions on Evolutionary Computation 20, 3 (2016), 386--402.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Anna Huang. 2008. Similarity measures for text document clustering. In Proceedings of the 6th New Zealand Computer Science Research Student Conference (NZCSRSC’08), Vol. 4. 9--56.Google ScholarGoogle Scholar
  17. Jinhyuk Lee, Wonjin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim, Chan Ho So, and Jaewoo Kang. 2019. BioBERT: Pre-trained biomedical language representation model for biomedical text mining. arXiv:1901.08746.Google ScholarGoogle Scholar
  18. Inderjeet Mani and Mark T. Maybury. 2001. Automatic summarization. Computational Linguistics 28, 2 (2001), 221--223.Google ScholarGoogle Scholar
  19. Ali Wagdy Mohamed, Hegazy Zaher Sabry, and Tareq Abd-Elaziz. 2013. Real parameter optimization by an effective differential evolution algorithm. Egyptian Informatics Journal 14, 1 (2013), 37--53.Google ScholarGoogle ScholarCross RefCross Ref
  20. Ramesh Nallapati, Feifei Zhai, and Bowen Zhou. 2017. SummaRuNNer: A recurrent neural network based sequence model for extractive summarization of documents. In Proceedings of the 31st AAAI Conference on Artificial Intelligence.Google ScholarGoogle Scholar
  21. Shashi Narayan, Nikos Papasarantopoulos, Shay B. Cohen, and Mirella Lapata. 2017. Neural extractive summarization with side information. arXiv:1704.04530.Google ScholarGoogle Scholar
  22. Tadashi Nomoto and Yuji Matsumoto. 2001. A new approach to unsupervised text summarization. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 26--34.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Hilário Oliveira, Rafael Dueire Lins, Rinaldo Lima, and Fred Freitas. 2017. A regression-based approach using integer linear programming for single-document summarization. In Proceedings of the 2017 IEEE 29th International Conference on Tools with Artificial Intelligence (ICTAI’17). IEEE, Los Alamitos, CA, 270--277.Google ScholarGoogle ScholarCross RefCross Ref
  24. Rebecca Passonneau, Karen Kukich, Jacques Robin, Vasileios Hatzivassiloglou, Larry Lefkowitz, and Hongyan Jing. 1996. Generating summaries of workflow diagrams. In Proceedings of the International Conference on Natural Language Processing and Industrial Applications. 204--210.Google ScholarGoogle Scholar
  25. Balaji Polepalli Ramesh, Ricky J. Sethi, and Hong Yu. 2015. Figure-associated text summarization and evaluation. PLoS One 10, 2 (2015), e0115671.Google ScholarGoogle Scholar
  26. Juan Enrique Ramos. 2003. Using TF-IDF to determine word relevance in document queries. In Proceedings of the 1st Instructional Conference on Machine Learning, Vol. 242. 133--142.Google ScholarGoogle Scholar
  27. Irina Rish. 2001. An empirical study of the naive Bayes classifier. In Proceedings of the IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence, Vol. 3. 41--46.Google ScholarGoogle Scholar
  28. Alexey Romanov and Chaitanya Shivade. 2018. Lessons from natural language inference in the clinical domain. arXiv:1808.06752.Google ScholarGoogle Scholar
  29. Alexander M. Rush, Sumit Chopra, and Jason Weston. 2015. A neural attention model for abstractive sentence summarization. arXiv:1509.00685.Google ScholarGoogle Scholar
  30. Sriparna Saha, Sayantan Mitra, and Stefan Kramer. 2018. Exploring multiobjective optimization for multiview clustering. ACM Transactions on Knowledge Discovery from Data 12, 4 (2018), 44.Google ScholarGoogle Scholar
  31. Naveen Saini, Sriparna Saha, and Pushpak Bhattacharyya. 2018. Automatic scientific document clustering using self-organized multi-objective differential evolution. Cognitive Computation 11 (Dec. 2019), 271--293.. DOI:https://doi.org/10.1007/s12559-018-9611-8Google ScholarGoogle Scholar
  32. Naveen Saini, Sriparna Saha, and Pushpak Bhattacharyya. 2019. Multiobjective-Based approach for microblog summarization. IEEE Transactions on Computational Social Systems 6, 6 (2019), 1219--1231.Google ScholarGoogle ScholarCross RefCross Ref
  33. Naveen Saini, Sriparna Saha, Aditya Harsh, and Pushpak Bhattacharyya. 2019. Sophisticated SOM based genetic operators in multi-objective clustering framework. Applied Intelligence 49, 5 (2019), 1803--1822.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Naveen Saini, Sriparna Saha, Anubhav Jangra, and Pushpak Bhattacharyya. 2019. Extractive single document summarization using multi-objective optimization: Exploring self-organized differential evolution, grey wolf optimizer and water cycle algorithm. Knowledge-Based Systems 164 (2019), 45--67.Google ScholarGoogle ScholarCross RefCross Ref
  35. Gerard Salton and Christopher Buckley. 1988. Term-weighting approaches in automatic text retrieval. Information Processing 8 Management 24, 5 (1988), 513--523.Google ScholarGoogle Scholar
  36. Jesus M. Sanchez-Gomez, Miguel A. Vega-Rodríguez, and Carlos J. Pérez. 2018. Extractive multi-document text summarization using a multi-objective artificial bee colony optimization approach. Knowledge-Based Systems 159 (2018), 1--8.Google ScholarGoogle ScholarCross RefCross Ref
  37. Rainer Storn and Kenneth Price. 1997. Differential evolution—A simple and efficient heuristic for global optimization over continuous spaces. Journal of Global Optimization 11, 4 (1997), 341--359.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Xiaojun Wan, Jianwu Yang, and Jianguo Xiao. 2007. Manifold-ranking based topic-focused multi-document summarization. In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI’07), Vol. 7. 2903--2908.Google ScholarGoogle Scholar
  39. Bing-Chuan Wang, Han-Xiong Li, Jia-Peng Li, and Yong Wang. 2018. Composite differential evolution for constrained evolutionary optimization. IEEE Transactions on Systems, Man, and Cybernetics: Systems99 (2018), 1--14.Google ScholarGoogle Scholar
  40. Lipo Wang. 2005. Support Vector Machines: Theory and Applications. Vol. 177. Springer Science 8 Business Media.Google ScholarGoogle Scholar
  41. Ling Wang, Xiping Fu, Muhammad Ilyas Menhas, and Minrui Fei. 2010. A modified binary differential evolution algorithm. In Life System Modeling and Intelligent Computing. Springer, 49--57.Google ScholarGoogle Scholar
  42. Yong Wang, Zixing Cai, and Qingfu Zhang. 2011. Differential evolution with composite trial vector generation strategies and control parameters. IEEE Transactions on Evolutionary Computation 15, 1 (2011), 55--66.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Peng Wu and Sandra Carberry. 2011. Toward extractive summarization of multimodal documents. In Proceedings of the Workshop on Text Summarization at the Canadian Conference on Artificial Intelligence. 53--61.Google ScholarGoogle Scholar
  44. Jen-Yuan Yeh, Hao-Ren Ke, Wei-Pang Yang, and I-Heng Meng. 2005. Text summarization using a trainable summarizer and latent semantic analysis. Information Processing 8 Management 41, 1 (2005), 75--95.Google ScholarGoogle Scholar
  45. Hong Yu, Shashank Agarwal, Mark Johnston, and Aaron Cohen. 2009. Are figure legends sufficient? Evaluating the contribution of associated text to biomedical figure comprehension. Journal of Biomedical Discovery and Collaboration 4, 1 (2009), 1.Google ScholarGoogle ScholarCross RefCross Ref
  46. Dan Zhang and Bin Wei. 2014. Comparison between differential evolution and particle swarm optimization algorithms. In Proceedings of the IEEE International Conference on Mechatronics and Automation (ICMA’14). IEEE, Los Alamitos, CA, 239--244.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Textual Entailment--Based Figure Summarization for Biomedical Articles

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Multimedia Computing, Communications, and Applications
        ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 16, Issue 1s
        Special Issue on Multimodal Machine Learning for Human Behavior Analysis and Special Issue on Computational Intelligence for Biomedical Data and Imaging
        January 2020
        376 pages
        ISSN:1551-6857
        EISSN:1551-6865
        DOI:10.1145/3388236
        Issue’s Table of Contents

        Copyright © 2020 ACM

        © 2020 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 17 April 2020
        • Revised: 1 August 2019
        • Accepted: 1 August 2019
        • Received: 1 May 2019
        Published in tomm Volume 16, Issue 1s

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format