Identifying Finding Sentences in Conclusion Subsections of Biomedical Abstracts

Li, Yingya; Yu, Bei

doi:10.1007/978-3-030-15742-5_64

Identifying Finding Sentences in Conclusion Subsections of Biomedical Abstracts

Yingya Li¹⁸ &
Bei Yu¹⁸

Conference paper
First Online: 13 March 2019

4954 Accesses
1 Altmetric

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11420))

Abstract

Segmenting scientific abstracts and full-text based on their rhetorical function is an essential task in text classification. Small rhetorical segments can be useful for fine-grained literature search, summarization, and comparison. Current effort has been focusing on segmenting documents into general sections such as introduction, method, and conclusion, and much less on the roles of individual sentences within the segments. For example, not all sentences in the conclusion section are describing research findings. In this work, we developed rule-based and machine learning methods and compared their performance in identifying the finding sentences in conclusion subsections of biomedical abstracts. 1100 conclusion subsections with observational and randomized clinical trials study designs covering five common health topics were sampled from PubMed to develop and evaluate the methods. The rule-based method and the bag-of-words based machine learning method both achieved high accuracy. The better performance by the simple rule-based approach shows that although advanced machine learning approaches could capture the main patterns, human expert may still outperform on such a specialized task.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Search strategies: Study Type public health: Search strategies by study type. http://libguides.adelaide.edu.au/c.php?g=165091p=5799888. Accessed 2 Jan 2018
Agarwal, S., Yu, H.: Automatically classifying sentences in full-text biomedical articles into introduction, methods, results and discussion. Bioinformatics 25(23), 3174–3180 (2009)
Article Google Scholar
Asghar, M.Z., Khan, A., Ahmad, S., Qasim, M., Khan, I.A.: Lexicon-enhanced sentiment analysis framework using rule-based classification scheme. PLoS ONE 12(2), e0171649 (2017)
Article Google Scholar
Asghar, M.Z., Khan, A., Bibi, A., Kundi, F.M., Ahmad, H.: Sentence-level emotion detection framework using rule-based classification. Cogn. Comput. 9(6), 868–894 (2017)
Article Google Scholar
Chapman, W.W., Bridewell, W., Hanbury, P., Cooper, G.F., Buchanan, B.G.: A simple algorithm for identifying negated findings and diseases in discharge summaries. J. Biomed. Inform. 34(5), 301–310 (2001)
Article Google Scholar
Chiu, K., Grundy, Q., Bero, L.: Spin in published biomedical literature: a methodological systematic review. PLoS Biol. 15(9), e2002173 (2017)
Article Google Scholar
Chung, G.Y.: Sentence retrieval for abstracts of randomized controlled trials. BMC Med. Inf. Decis. Making 9(1), 10 (2009)
Article Google Scholar
Cofield, S.S., Corona, R.V., Allison, D.B.: Use of causal language in observational studies of obesity and nutrition. Obes. Facts 3(6), 353–356 (2010)
Article Google Scholar
Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Measur. 20(1), 37–46 (1960)
Article Google Scholar
Friedman, C., Alderson, P.O., Austin, J.H., Cimino, J.J., Johnson, S.B.: A general natural-language text processor for clinical radiology. J. Am. Med. Inform. Assoc. 1(2), 161–174 (1994)
Article Google Scholar
Gabb, H.A., Lucic, A., Blake, C.: A method to automatically identify the results from journal articles. In: iConference 2015 Proceedings (2015)
Google Scholar
Guo, Y., Korhonen, A., Liakata, M., Karolinska, I.S., Sun, L., Stenius, U.: Identifying the information structure of scientific abstracts: an investigation of three different schemes. In: Proceedings of the 2010 Workshop on Biomedical Natural Language Processing, pp. 99–107. Association for Computational Linguistics (2010)
Google Scholar
Hirohata, K., Okazaki, N., Ananiadou, S., Ishizuka, M.: Identifying sections in scientific abstracts using conditional random fields. In: Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-I (2008)
Google Scholar
Kilicoglu, H., Rosemblat, G., Malički, M., ter Riet, G.: Automatic recognition of self-acknowledged limitations in clinical research literature. J. Am. Med. Inform. Assoc. 25(7), 855–861 (2018)
Article Google Scholar
Kim, S.N., Martinez, D., Cavedon, L., Yencken, L.: Automatic classification of sentences to support evidence based medicine. BMC Bioinf. 12, S5 (2011). BioMed Central
Article Google Scholar
Kundi, F.M., Khan, A., Ahmad, S., Asghar, M.Z.: Lexicon-based sentiment analysis in the social web. J. Basic Appl. Sci. Res. 4(6), 238–248 (2014)
Google Scholar
Liakata, M., Teufel, S., Siddharthan, A., Batchelor, C.R., et al.: Corpora for the conceptualisation and zoning of scientific papers. In: LREC. Citeseer (2010)
Google Scholar
Manning, C., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S., McClosky, D.: The stanford coreNLP natural language processing toolkit. In: Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55–60 (2014)
Google Scholar
McHugh, M.L.: Interrater reliability: the kappa statistic. Biochemia medica: Biochemia medica 22(3), 276–282 (2012)
Article MathSciNet Google Scholar
McKnight, L., Srinivasan, P.: Categorization of sentence types in medical abstracts. In: AMIA Annual Symposium Proceedings, vol. 2003, p. 440. American Medical Informatics Association (2003)
Google Scholar
Mizuta, Y., Korhonen, A., Mullen, T., Collier, N.: Zone analysis in biology articles as a basis for information extraction. Int. J. Med. Inf. 75(6), 468–487 (2006)
Article Google Scholar
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Ruch, P., et al.: Using argumentation to extract key sentences from biomedical abstracts. Int. J. Med. Inf. 76(2–3), 195–200 (2007)
Article Google Scholar
Teufel, S., Moens, M.: Summarizing scientific articles: experiments with relevance and rhetorical status. Comput. Linguist. 28(4), 409–445 (2002)
Article Google Scholar
Teufel, S., Siddharthan, A., Batchelor, C.: Towards discipline-independent argumentative zoning: evidence from chemistry and computational linguistics. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, vol. 3, pp. 1493–1502. Association for Computational Linguistics (2009)
Google Scholar
Yu, H., Hripcsak, G., Friedman, C.: Mapping abbreviations to full forms in biomedical articles. J. Am. Med. Inform. Assoc. 9(3), 262–272 (2002)
Article Google Scholar

Download references

Acknowledgement

We would like to thank Shiqi Qu who have contributed to the inter-coder agreement checking and corpus construction.

Author information

Authors and Affiliations

Syracuse University, Syracuse, NY, 13244, USA
Yingya Li & Bei Yu

Authors

Yingya Li
View author publications
You can also search for this author in PubMed Google Scholar
Bei Yu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bei Yu .

Editor information

Editors and Affiliations

University of South Florida, Tampa, FL, USA
Natalie Greene Taylor
University of Maryland, College Park, MD, USA
Caitlin Christian-Lamb
University of Washington, Seattle, WA, USA
Michelle H. Martin
University of California, Irvine, Irvine, CA, USA
Bonnie Nardi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, Y., Yu, B. (2019). Identifying Finding Sentences in Conclusion Subsections of Biomedical Abstracts. In: Taylor, N., Christian-Lamb, C., Martin, M., Nardi, B. (eds) Information in Contemporary Society. iConference 2019. Lecture Notes in Computer Science(), vol 11420. Springer, Cham. https://doi.org/10.1007/978-3-030-15742-5_64

Download citation

DOI: https://doi.org/10.1007/978-3-030-15742-5_64
Published: 13 March 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-15741-8
Online ISBN: 978-3-030-15742-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics