Abstract
This article proposes a novel unsupervised approach (FigSum++) for automatic figure summarization in biomedical scientific articles using a multi-objective evolutionary algorithm. The problem is treated as an optimization problem where relevant sentences in the summary for a given figure are selected based on various sentence scoring features (or objective functions), such as the textual entailment score between sentences in the summary and a figure’s caption, the number of sentences referring to that figure, semantic similarity between sentences and a figure’s caption, and the number of overlapping words between sentences and a figure’s caption. These objective functions are optimized simultaneously using multi-objective binary differential evolution (MBDE). MBDE consists of a set of solutions, and each solution represents a subset of sentences to be selected in the summary. MBDE generally uses a single differential evolution variant, but in the current study, an ensemble of two different differential evolution variants measuring diversity among solutions and convergence toward global optimal solution, respectively, is employed for efficient search. Usually, in any summarization system, diversity among sentences (called anti-redundancy) in the summary is a very critical feature, and it is calculated in terms of similarity (like cosine similarity) among sentences. In this article, a new way of measuring diversity in terms of textual entailment is proposed. To represent the sentences of the article in the form of numeric vectors, the recently proposed BioBERT pre-trained language model in biomedical text mining is utilized. An ablation study has also been presented to determine the importance of different objective functions. For evaluation of the proposed technique, two benchmark biomedical datasets containing 91 and 84 figures are considered. Our proposed system obtains 5% and 11% improvements in terms of the F-measure metric over two datasets, compared to the state-of-the-art unsupervised methods.
Supplemental Material
Available for Download
Supplemental movie, appendix, image and software files for, Textual Entailment--Based Figure Summarization for Biomedical Articles
- Stergos Afantenos, Vangelis Karkaletsis, and Panagiotis Stamatopoulos. 2005. Summarization from medical documents: A survey. Artificial Intelligence in Medicine 33, 2 (2005), 157--177.Google ScholarDigital Library
- Shashank Agarwal and Hong Yu. 2009. FigSum: Automatically generating structured text summaries for figures in biomedical literature. In AMIA Annual Symposium Proceedings, Vol. 2009. American Medical Informatics Association, Bethesda, MD, 6--10.Google Scholar
- Rasim M. Alguliev, Ramiz M. Aliguliyev, and Nijat R. Isazade. 2012. DESAMC+ DocSum: Differential evolution with self-adaptive mutation and crossover parameters for multi-document summarization. Knowledge-Based Systems 36 (2012), 21--38.Google ScholarDigital Library
- Rasim M. Alguliyev, Ramiz M. Aliguliyev, and Nijat R. Isazade. 2013. Multiple documents summarization based on evolutionary optimization algorithm. Expert Systems with Applications 40 (2013), 1675--1689.Google ScholarDigital Library
- Sanghamitra Bandyopadhyay, Sriparna Saha, Ujjwal Maulik, and Kalyanmoy Deb. 2008. A simulated annealing-based multiobjective optimization algorithm: AMOSA. IEEE Transactions on Evolutionary Computation 12, 3 (2008), 269--283.Google ScholarDigital Library
- Sumit Bhatia and Prasenjit Mitra. 2012. Summarizing figures, tables, and algorithms in scientific publications to augment search results. ACM Transactions on Information Systems 30, 1 (2012), 3.Google ScholarDigital Library
- Swagatam Das and Ponnuthurai Nagaratnam Suganthan. 2011. Differential evolution: A survey of the state-of-the-art. IEEE Transactions on Evolutionary Computation 15, 1 (2011), 4--31.Google ScholarDigital Library
- Kalyanmoy Deb, Amrit Pratap, Sameer Agarwal, and Tamt Meyarivan. 2002. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation 6, 2 (2002), 182--197.Google ScholarDigital Library
- Daniel M. Dunlavy, Dianne P O’Leary, John M. Conroy, and Judith D. Schlesinger. 2007. QCS: A system for querying, clustering and summarizing documents. Information Processing 8 Management 43, 6 (2007), 1588--1605.Google Scholar
- Soumi Dutta, Vibhash Chandra, Kanav Mehra, Asit Kumar Das, Tanmoy Chakraborty, and Saptarshi Ghosh. 2018. Ensemble algorithms for microblog summarization. IEEE Intelligent Systems 33, 3 (2018), 4--14.Google ScholarCross Ref
- Rafael Ferreira, Luciano de Souza Cabral, Rafael Dueire Lins, Gabriel Pereira e Silva, Fred Freitas, George D. C. Cavalcanti, Rinaldo Lima, Steven J. Simske, and Luciano Favaro. 2013. Assessing sentence scoring techniques for extractive text summarization. Expert Systems with Applications 40, 14 (2013), 5755--5764.Google ScholarCross Ref
- Robert P. Futrelle. 1999. Summarization of diagrams in documents. In Advances in Automated Text Summarization, I. Mani and M. Maybury. MIT Press, Cambridge, MA, 403--421.Google Scholar
- Robert P. Futrelle. 2004. Handling figures in document summarization. In Proceedings of the Workshop at the Annual Meeting of the Association for Computational Linguistics. 61--65.Google Scholar
- Eugene J. Guglielmo and Neil C. Rowe. 1996. Natural-language retrieval of images based on descriptive captions. ACM Transactions on Information Systems 14, 3 (1996), 237--267.Google ScholarDigital Library
- Zhenan He and Gary G. Yen. 2016. Visualization and performance metric in many-objective optimization. IEEE Transactions on Evolutionary Computation 20, 3 (2016), 386--402.Google ScholarDigital Library
- Anna Huang. 2008. Similarity measures for text document clustering. In Proceedings of the 6th New Zealand Computer Science Research Student Conference (NZCSRSC’08), Vol. 4. 9--56.Google Scholar
- Jinhyuk Lee, Wonjin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim, Chan Ho So, and Jaewoo Kang. 2019. BioBERT: Pre-trained biomedical language representation model for biomedical text mining. arXiv:1901.08746.Google Scholar
- Inderjeet Mani and Mark T. Maybury. 2001. Automatic summarization. Computational Linguistics 28, 2 (2001), 221--223.Google Scholar
- Ali Wagdy Mohamed, Hegazy Zaher Sabry, and Tareq Abd-Elaziz. 2013. Real parameter optimization by an effective differential evolution algorithm. Egyptian Informatics Journal 14, 1 (2013), 37--53.Google ScholarCross Ref
- Ramesh Nallapati, Feifei Zhai, and Bowen Zhou. 2017. SummaRuNNer: A recurrent neural network based sequence model for extractive summarization of documents. In Proceedings of the 31st AAAI Conference on Artificial Intelligence.Google Scholar
- Shashi Narayan, Nikos Papasarantopoulos, Shay B. Cohen, and Mirella Lapata. 2017. Neural extractive summarization with side information. arXiv:1704.04530.Google Scholar
- Tadashi Nomoto and Yuji Matsumoto. 2001. A new approach to unsupervised text summarization. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 26--34.Google ScholarDigital Library
- Hilário Oliveira, Rafael Dueire Lins, Rinaldo Lima, and Fred Freitas. 2017. A regression-based approach using integer linear programming for single-document summarization. In Proceedings of the 2017 IEEE 29th International Conference on Tools with Artificial Intelligence (ICTAI’17). IEEE, Los Alamitos, CA, 270--277.Google ScholarCross Ref
- Rebecca Passonneau, Karen Kukich, Jacques Robin, Vasileios Hatzivassiloglou, Larry Lefkowitz, and Hongyan Jing. 1996. Generating summaries of workflow diagrams. In Proceedings of the International Conference on Natural Language Processing and Industrial Applications. 204--210.Google Scholar
- Balaji Polepalli Ramesh, Ricky J. Sethi, and Hong Yu. 2015. Figure-associated text summarization and evaluation. PLoS One 10, 2 (2015), e0115671.Google Scholar
- Juan Enrique Ramos. 2003. Using TF-IDF to determine word relevance in document queries. In Proceedings of the 1st Instructional Conference on Machine Learning, Vol. 242. 133--142.Google Scholar
- Irina Rish. 2001. An empirical study of the naive Bayes classifier. In Proceedings of the IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence, Vol. 3. 41--46.Google Scholar
- Alexey Romanov and Chaitanya Shivade. 2018. Lessons from natural language inference in the clinical domain. arXiv:1808.06752.Google Scholar
- Alexander M. Rush, Sumit Chopra, and Jason Weston. 2015. A neural attention model for abstractive sentence summarization. arXiv:1509.00685.Google Scholar
- Sriparna Saha, Sayantan Mitra, and Stefan Kramer. 2018. Exploring multiobjective optimization for multiview clustering. ACM Transactions on Knowledge Discovery from Data 12, 4 (2018), 44.Google Scholar
- Naveen Saini, Sriparna Saha, and Pushpak Bhattacharyya. 2018. Automatic scientific document clustering using self-organized multi-objective differential evolution. Cognitive Computation 11 (Dec. 2019), 271--293.. DOI:https://doi.org/10.1007/s12559-018-9611-8Google Scholar
- Naveen Saini, Sriparna Saha, and Pushpak Bhattacharyya. 2019. Multiobjective-Based approach for microblog summarization. IEEE Transactions on Computational Social Systems 6, 6 (2019), 1219--1231.Google ScholarCross Ref
- Naveen Saini, Sriparna Saha, Aditya Harsh, and Pushpak Bhattacharyya. 2019. Sophisticated SOM based genetic operators in multi-objective clustering framework. Applied Intelligence 49, 5 (2019), 1803--1822.Google ScholarDigital Library
- Naveen Saini, Sriparna Saha, Anubhav Jangra, and Pushpak Bhattacharyya. 2019. Extractive single document summarization using multi-objective optimization: Exploring self-organized differential evolution, grey wolf optimizer and water cycle algorithm. Knowledge-Based Systems 164 (2019), 45--67.Google ScholarCross Ref
- Gerard Salton and Christopher Buckley. 1988. Term-weighting approaches in automatic text retrieval. Information Processing 8 Management 24, 5 (1988), 513--523.Google Scholar
- Jesus M. Sanchez-Gomez, Miguel A. Vega-Rodríguez, and Carlos J. Pérez. 2018. Extractive multi-document text summarization using a multi-objective artificial bee colony optimization approach. Knowledge-Based Systems 159 (2018), 1--8.Google ScholarCross Ref
- Rainer Storn and Kenneth Price. 1997. Differential evolution—A simple and efficient heuristic for global optimization over continuous spaces. Journal of Global Optimization 11, 4 (1997), 341--359.Google ScholarDigital Library
- Xiaojun Wan, Jianwu Yang, and Jianguo Xiao. 2007. Manifold-ranking based topic-focused multi-document summarization. In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI’07), Vol. 7. 2903--2908.Google Scholar
- Bing-Chuan Wang, Han-Xiong Li, Jia-Peng Li, and Yong Wang. 2018. Composite differential evolution for constrained evolutionary optimization. IEEE Transactions on Systems, Man, and Cybernetics: Systems99 (2018), 1--14.Google Scholar
- Lipo Wang. 2005. Support Vector Machines: Theory and Applications. Vol. 177. Springer Science 8 Business Media.Google Scholar
- Ling Wang, Xiping Fu, Muhammad Ilyas Menhas, and Minrui Fei. 2010. A modified binary differential evolution algorithm. In Life System Modeling and Intelligent Computing. Springer, 49--57.Google Scholar
- Yong Wang, Zixing Cai, and Qingfu Zhang. 2011. Differential evolution with composite trial vector generation strategies and control parameters. IEEE Transactions on Evolutionary Computation 15, 1 (2011), 55--66.Google ScholarDigital Library
- Peng Wu and Sandra Carberry. 2011. Toward extractive summarization of multimodal documents. In Proceedings of the Workshop on Text Summarization at the Canadian Conference on Artificial Intelligence. 53--61.Google Scholar
- Jen-Yuan Yeh, Hao-Ren Ke, Wei-Pang Yang, and I-Heng Meng. 2005. Text summarization using a trainable summarizer and latent semantic analysis. Information Processing 8 Management 41, 1 (2005), 75--95.Google Scholar
- Hong Yu, Shashank Agarwal, Mark Johnston, and Aaron Cohen. 2009. Are figure legends sufficient? Evaluating the contribution of associated text to biomedical figure comprehension. Journal of Biomedical Discovery and Collaboration 4, 1 (2009), 1.Google ScholarCross Ref
- Dan Zhang and Bin Wei. 2014. Comparison between differential evolution and particle swarm optimization algorithms. In Proceedings of the IEEE International Conference on Mechatronics and Automation (ICMA’14). IEEE, Los Alamitos, CA, 239--244.Google ScholarCross Ref
Index Terms
- Textual Entailment--Based Figure Summarization for Biomedical Articles
Recommendations
A statistics-based semantic textual entailment system
MICAI'11: Proceedings of the 10th Mexican international conference on Advances in Artificial Intelligence - Volume Part IWe present a Textual Entailment (TE) recognition system that uses semantic features based on the Universal Networking Language (UNL). The proposed TE system compares the UNL relations in both the text and the hypothesis to arrive at the two-way ...
Recognizing Textual Entailment Based on WordNet
IITA '08: Proceedings of the 2008 Second International Symposium on Intelligent Information Technology Application - Volume 02Textual Entailment Recognition (RTE) is one of the fundamental problems in many natural language processing applications. This paper proposes a new method for lexical entailment measure which is based on exploiting the information in the WordNet ...
Semi-supervised Textual Entailment on Indonesian Wikipedia Data
Computational Linguistics and Intelligent Text ProcessingAbstractRecognizing Textual Entailment (RTE) is a research in Natural Language Processing that aims to identify whether there is an entailment relation between two texts. Textual Entailment has been studied in a variety of languages, but it is rare for ...
Comments