research-article

Textual Entailment--Based Figure Summarization for Biomedical Articles

Authors:
Naveen Saini

Indian Institute of Technology Patna, Bihar, India

Indian Institute of Technology Patna, Bihar, India

0000-0002-2421-1457
View Profile

,
Sriparna Saha

Indian Institute of Technology Patna, Bihar, India

Indian Institute of Technology Patna, Bihar, India
View Profile

,
Pushpak Bhattacharyya

Indian Institute of Technology Patna, Bihar, India

Indian Institute of Technology Patna, Bihar, India
View Profile

,
Himanshu Tuteja

Birla Institute of Technology Mersa, Jaipur, India

Birla Institute of Technology Mersa, Jaipur, India
View Profile

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 16 Issue 1sArticle No.: 35pp 1–24https://doi.org/10.1145/3357334

Published:17 April 2020Publication History

ACM Transactions on Multimedia Computing, Communications, and Applications

Abstract

This article proposes a novel unsupervised approach (FigSum++) for automatic figure summarization in biomedical scientific articles using a multi-objective evolutionary algorithm. The problem is treated as an optimization problem where relevant sentences in the summary for a given figure are selected based on various sentence scoring features (or objective functions), such as the textual entailment score between sentences in the summary and a figure’s caption, the number of sentences referring to that figure, semantic similarity between sentences and a figure’s caption, and the number of overlapping words between sentences and a figure’s caption. These objective functions are optimized simultaneously using multi-objective binary differential evolution (MBDE). MBDE consists of a set of solutions, and each solution represents a subset of sentences to be selected in the summary. MBDE generally uses a single differential evolution variant, but in the current study, an ensemble of two different differential evolution variants measuring diversity among solutions and convergence toward global optimal solution, respectively, is employed for efficient search. Usually, in any summarization system, diversity among sentences (called anti-redundancy) in the summary is a very critical feature, and it is calculated in terms of similarity (like cosine similarity) among sentences. In this article, a new way of measuring diversity in terms of textual entailment is proposed. To represent the sentences of the article in the form of numeric vectors, the recently proposed BioBERT pre-trained language model in biomedical text mining is utilized. An ablation study has also been presented to determine the importance of different objective functions. For evaluation of the proposed technique, two benchmark biomedical datasets containing 91 and 84 figures are considered. Our proposed system obtains 5% and 11% improvements in terms of the F-measure metric over two datasets, compared to the state-of-the-art unsupervised methods.

Supplemental Material

Available for Download

zip

saini.zip (1.2 MB)

Supplemental movie, appendix, image and software files for, Textual Entailment--Based Figure Summarization for Biomedical Articles

References

Stergos Afantenos, Vangelis Karkaletsis, and Panagiotis Stamatopoulos. 2005. Summarization from medical documents: A survey. Artificial Intelligence in Medicine 33, 2 (2005), 157--177.Google ScholarDigital Library
Shashank Agarwal and Hong Yu. 2009. FigSum: Automatically generating structured text summaries for figures in biomedical literature. In AMIA Annual Symposium Proceedings, Vol. 2009. American Medical Informatics Association, Bethesda, MD, 6--10.Google Scholar
Rasim M. Alguliev, Ramiz M. Aliguliyev, and Nijat R. Isazade. 2012. DESAMC+ DocSum: Differential evolution with self-adaptive mutation and crossover parameters for multi-document summarization. Knowledge-Based Systems 36 (2012), 21--38.Google ScholarDigital Library
Rasim M. Alguliyev, Ramiz M. Aliguliyev, and Nijat R. Isazade. 2013. Multiple documents summarization based on evolutionary optimization algorithm. Expert Systems with Applications 40 (2013), 1675--1689.Google ScholarDigital Library
Sanghamitra Bandyopadhyay, Sriparna Saha, Ujjwal Maulik, and Kalyanmoy Deb. 2008. A simulated annealing-based multiobjective optimization algorithm: AMOSA. IEEE Transactions on Evolutionary Computation 12, 3 (2008), 269--283.Google ScholarDigital Library
Sumit Bhatia and Prasenjit Mitra. 2012. Summarizing figures, tables, and algorithms in scientific publications to augment search results. ACM Transactions on Information Systems 30, 1 (2012), 3.Google ScholarDigital Library
Swagatam Das and Ponnuthurai Nagaratnam Suganthan. 2011. Differential evolution: A survey of the state-of-the-art. IEEE Transactions on Evolutionary Computation 15, 1 (2011), 4--31.Google ScholarDigital Library
Kalyanmoy Deb, Amrit Pratap, Sameer Agarwal, and Tamt Meyarivan. 2002. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation 6, 2 (2002), 182--197.Google ScholarDigital Library
Daniel M. Dunlavy, Dianne P O’Leary, John M. Conroy, and Judith D. Schlesinger. 2007. QCS: A system for querying, clustering and summarizing documents. Information Processing 8 Management 43, 6 (2007), 1588--1605.Google Scholar
Soumi Dutta, Vibhash Chandra, Kanav Mehra, Asit Kumar Das, Tanmoy Chakraborty, and Saptarshi Ghosh. 2018. Ensemble algorithms for microblog summarization. IEEE Intelligent Systems 33, 3 (2018), 4--14.Google ScholarCross Ref
Rafael Ferreira, Luciano de Souza Cabral, Rafael Dueire Lins, Gabriel Pereira e Silva, Fred Freitas, George D. C. Cavalcanti, Rinaldo Lima, Steven J. Simske, and Luciano Favaro. 2013. Assessing sentence scoring techniques for extractive text summarization. Expert Systems with Applications 40, 14 (2013), 5755--5764.Google ScholarCross Ref
Robert P. Futrelle. 1999. Summarization of diagrams in documents. In Advances in Automated Text Summarization, I. Mani and M. Maybury. MIT Press, Cambridge, MA, 403--421.Google Scholar
Robert P. Futrelle. 2004. Handling figures in document summarization. In Proceedings of the Workshop at the Annual Meeting of the Association for Computational Linguistics. 61--65.Google Scholar
Eugene J. Guglielmo and Neil C. Rowe. 1996. Natural-language retrieval of images based on descriptive captions. ACM Transactions on Information Systems 14, 3 (1996), 237--267.Google ScholarDigital Library
Zhenan He and Gary G. Yen. 2016. Visualization and performance metric in many-objective optimization. IEEE Transactions on Evolutionary Computation 20, 3 (2016), 386--402.Google ScholarDigital Library
Anna Huang. 2008. Similarity measures for text document clustering. In Proceedings of the 6th New Zealand Computer Science Research Student Conference (NZCSRSC’08), Vol. 4. 9--56.Google Scholar
Jinhyuk Lee, Wonjin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim, Chan Ho So, and Jaewoo Kang. 2019. BioBERT: Pre-trained biomedical language representation model for biomedical text mining. arXiv:1901.08746.Google Scholar
Inderjeet Mani and Mark T. Maybury. 2001. Automatic summarization. Computational Linguistics 28, 2 (2001), 221--223.Google Scholar
Ali Wagdy Mohamed, Hegazy Zaher Sabry, and Tareq Abd-Elaziz. 2013. Real parameter optimization by an effective differential evolution algorithm. Egyptian Informatics Journal 14, 1 (2013), 37--53.Google ScholarCross Ref
Ramesh Nallapati, Feifei Zhai, and Bowen Zhou. 2017. SummaRuNNer: A recurrent neural network based sequence model for extractive summarization of documents. In Proceedings of the 31st AAAI Conference on Artificial Intelligence.Google Scholar
Shashi Narayan, Nikos Papasarantopoulos, Shay B. Cohen, and Mirella Lapata. 2017. Neural extractive summarization with side information. arXiv:1704.04530.Google Scholar
Tadashi Nomoto and Yuji Matsumoto. 2001. A new approach to unsupervised text summarization. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 26--34.Google ScholarDigital Library
Hilário Oliveira, Rafael Dueire Lins, Rinaldo Lima, and Fred Freitas. 2017. A regression-based approach using integer linear programming for single-document summarization. In Proceedings of the 2017 IEEE 29th International Conference on Tools with Artificial Intelligence (ICTAI’17). IEEE, Los Alamitos, CA, 270--277.Google ScholarCross Ref
Rebecca Passonneau, Karen Kukich, Jacques Robin, Vasileios Hatzivassiloglou, Larry Lefkowitz, and Hongyan Jing. 1996. Generating summaries of workflow diagrams. In Proceedings of the International Conference on Natural Language Processing and Industrial Applications. 204--210.Google Scholar
Balaji Polepalli Ramesh, Ricky J. Sethi, and Hong Yu. 2015. Figure-associated text summarization and evaluation. PLoS One 10, 2 (2015), e0115671.Google Scholar
Juan Enrique Ramos. 2003. Using TF-IDF to determine word relevance in document queries. In Proceedings of the 1st Instructional Conference on Machine Learning, Vol. 242. 133--142.Google Scholar
Irina Rish. 2001. An empirical study of the naive Bayes classifier. In Proceedings of the IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence, Vol. 3. 41--46.Google Scholar
Alexey Romanov and Chaitanya Shivade. 2018. Lessons from natural language inference in the clinical domain. arXiv:1808.06752.Google Scholar
Alexander M. Rush, Sumit Chopra, and Jason Weston. 2015. A neural attention model for abstractive sentence summarization. arXiv:1509.00685.Google Scholar
Sriparna Saha, Sayantan Mitra, and Stefan Kramer. 2018. Exploring multiobjective optimization for multiview clustering. ACM Transactions on Knowledge Discovery from Data 12, 4 (2018), 44.Google Scholar
Naveen Saini, Sriparna Saha, and Pushpak Bhattacharyya. 2018. Automatic scientific document clustering using self-organized multi-objective differential evolution. Cognitive Computation 11 (Dec. 2019), 271--293.. DOI:https://doi.org/10.1007/s12559-018-9611-8Google Scholar
Naveen Saini, Sriparna Saha, and Pushpak Bhattacharyya. 2019. Multiobjective-Based approach for microblog summarization. IEEE Transactions on Computational Social Systems 6, 6 (2019), 1219--1231.Google ScholarCross Ref
Naveen Saini, Sriparna Saha, Aditya Harsh, and Pushpak Bhattacharyya. 2019. Sophisticated SOM based genetic operators in multi-objective clustering framework. Applied Intelligence 49, 5 (2019), 1803--1822.Google ScholarDigital Library
Naveen Saini, Sriparna Saha, Anubhav Jangra, and Pushpak Bhattacharyya. 2019. Extractive single document summarization using multi-objective optimization: Exploring self-organized differential evolution, grey wolf optimizer and water cycle algorithm. Knowledge-Based Systems 164 (2019), 45--67.Google ScholarCross Ref
Gerard Salton and Christopher Buckley. 1988. Term-weighting approaches in automatic text retrieval. Information Processing 8 Management 24, 5 (1988), 513--523.Google Scholar
Jesus M. Sanchez-Gomez, Miguel A. Vega-Rodríguez, and Carlos J. Pérez. 2018. Extractive multi-document text summarization using a multi-objective artificial bee colony optimization approach. Knowledge-Based Systems 159 (2018), 1--8.Google ScholarCross Ref
Rainer Storn and Kenneth Price. 1997. Differential evolution—A simple and efficient heuristic for global optimization over continuous spaces. Journal of Global Optimization 11, 4 (1997), 341--359.Google ScholarDigital Library
Xiaojun Wan, Jianwu Yang, and Jianguo Xiao. 2007. Manifold-ranking based topic-focused multi-document summarization. In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI’07), Vol. 7. 2903--2908.Google Scholar
Bing-Chuan Wang, Han-Xiong Li, Jia-Peng Li, and Yong Wang. 2018. Composite differential evolution for constrained evolutionary optimization. IEEE Transactions on Systems, Man, and Cybernetics: Systems99 (2018), 1--14.Google Scholar
Lipo Wang. 2005. Support Vector Machines: Theory and Applications. Vol. 177. Springer Science 8 Business Media.Google Scholar
Ling Wang, Xiping Fu, Muhammad Ilyas Menhas, and Minrui Fei. 2010. A modified binary differential evolution algorithm. In Life System Modeling and Intelligent Computing. Springer, 49--57.Google Scholar
Yong Wang, Zixing Cai, and Qingfu Zhang. 2011. Differential evolution with composite trial vector generation strategies and control parameters. IEEE Transactions on Evolutionary Computation 15, 1 (2011), 55--66.Google ScholarDigital Library
Peng Wu and Sandra Carberry. 2011. Toward extractive summarization of multimodal documents. In Proceedings of the Workshop on Text Summarization at the Canadian Conference on Artificial Intelligence. 53--61.Google Scholar
Jen-Yuan Yeh, Hao-Ren Ke, Wei-Pang Yang, and I-Heng Meng. 2005. Text summarization using a trainable summarizer and latent semantic analysis. Information Processing 8 Management 41, 1 (2005), 75--95.Google Scholar
Hong Yu, Shashank Agarwal, Mark Johnston, and Aaron Cohen. 2009. Are figure legends sufficient? Evaluating the contribution of associated text to biomedical figure comprehension. Journal of Biomedical Discovery and Collaboration 4, 1 (2009), 1.Google ScholarCross Ref
Dan Zhang and Bin Wei. 2014. Comparison between differential evolution and particle swarm optimization algorithms. In Proceedings of the IEEE International Conference on Mechatronics and Automation (ICMA’14). IEEE, Los Alamitos, CA, 239--244.Google ScholarCross Ref

Index Terms

Textual Entailment--Based Figure Summarization for Biomedical Articles
1. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Information extraction
      2. Summarization

Recommendations

A statistics-based semantic textual entailment system
MICAI'11: Proceedings of the 10th Mexican international conference on Advances in Artificial Intelligence - Volume Part I

We present a Textual Entailment (TE) recognition system that uses semantic features based on the Universal Networking Language (UNL). The proposed TE system compares the UNL relations in both the text and the hypothesis to arrive at the two-way ...
Read More
Recognizing Textual Entailment Based on WordNet
IITA '08: Proceedings of the 2008 Second International Symposium on Intelligent Information Technology Application - Volume 02

Textual Entailment Recognition (RTE) is one of the fundamental problems in many natural language processing applications. This paper proposes a new method for lexical entailment measure which is based on exploiting the information in the WordNet ...
Read More
Semi-supervised Textual Entailment on Indonesian Wikipedia Data
Computational Linguistics and Intelligent Text Processing
Abstract
Recognizing Textual Entailment (RTE) is a research in Natural Language Processing that aims to identify whether there is an entailment relation between two texts. Textual Entailment has been studied in a variety of languages, but it is rare for ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Multimedia Computing, Communications, and Applications Volume 16, Issue 1s
Special Issue on Multimodal Machine Learning for Human Behavior Analysis and Special Issue on Computational Intelligence for Biomedical Data and Imaging
January 2020
376 pages
ISSN:1551-6857
EISSN:1551-6865
DOI:10.1145/3388236
Editor:
Alberto Del Bimbo
University of Firenze, Italy
Issue’s Table of Contents
Copyright © 2020 ACM
© 2020 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 April 2020
- Revised: 1 August 2019
- Accepted: 1 August 2019
- Received: 1 May 2019
Published in tomm Volume 16, Issue 1s

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Figure-assisted text summarization
evolutionary computing
multi-objective optimization (MOO)
textual entailment
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 10
  Total Citations
  View Citations
- 292
  Total Downloads
- Downloads (Last 12 months)30
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Textual Entailment--Based Figure Summarization for Biomedical Articles

ACM Transactions on Multimedia Computing, Communications, and Applications

Abstract

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

A statistics-based semantic textual entailment system

Recognizing Textual Entailment Based on WordNet

Semi-supervised Textual Entailment on Indonesian Wikipedia Data