Abstract
Segmenting scientific abstracts and full-text based on their rhetorical function is an essential task in text classification. Small rhetorical segments can be useful for fine-grained literature search, summarization, and comparison. Current effort has been focusing on segmenting documents into general sections such as introduction, method, and conclusion, and much less on the roles of individual sentences within the segments. For example, not all sentences in the conclusion section are describing research findings. In this work, we developed rule-based and machine learning methods and compared their performance in identifying the finding sentences in conclusion subsections of biomedical abstracts. 1100 conclusion subsections with observational and randomized clinical trials study designs covering five common health topics were sampled from PubMed to develop and evaluate the methods. The rule-based method and the bag-of-words based machine learning method both achieved high accuracy. The better performance by the simple rule-based approach shows that although advanced machine learning approaches could capture the main patterns, human expert may still outperform on such a specialized task.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Search strategies: Study Type public health: Search strategies by study type. http://libguides.adelaide.edu.au/c.php?g=165091p=5799888. Accessed 2 Jan 2018
Agarwal, S., Yu, H.: Automatically classifying sentences in full-text biomedical articles into introduction, methods, results and discussion. Bioinformatics 25(23), 3174–3180 (2009)
Asghar, M.Z., Khan, A., Ahmad, S., Qasim, M., Khan, I.A.: Lexicon-enhanced sentiment analysis framework using rule-based classification scheme. PLoS ONE 12(2), e0171649 (2017)
Asghar, M.Z., Khan, A., Bibi, A., Kundi, F.M., Ahmad, H.: Sentence-level emotion detection framework using rule-based classification. Cogn. Comput. 9(6), 868–894 (2017)
Chapman, W.W., Bridewell, W., Hanbury, P., Cooper, G.F., Buchanan, B.G.: A simple algorithm for identifying negated findings and diseases in discharge summaries. J. Biomed. Inform. 34(5), 301–310 (2001)
Chiu, K., Grundy, Q., Bero, L.: Spin in published biomedical literature: a methodological systematic review. PLoS Biol. 15(9), e2002173 (2017)
Chung, G.Y.: Sentence retrieval for abstracts of randomized controlled trials. BMC Med. Inf. Decis. Making 9(1), 10 (2009)
Cofield, S.S., Corona, R.V., Allison, D.B.: Use of causal language in observational studies of obesity and nutrition. Obes. Facts 3(6), 353–356 (2010)
Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Measur. 20(1), 37–46 (1960)
Friedman, C., Alderson, P.O., Austin, J.H., Cimino, J.J., Johnson, S.B.: A general natural-language text processor for clinical radiology. J. Am. Med. Inform. Assoc. 1(2), 161–174 (1994)
Gabb, H.A., Lucic, A., Blake, C.: A method to automatically identify the results from journal articles. In: iConference 2015 Proceedings (2015)
Guo, Y., Korhonen, A., Liakata, M., Karolinska, I.S., Sun, L., Stenius, U.: Identifying the information structure of scientific abstracts: an investigation of three different schemes. In: Proceedings of the 2010 Workshop on Biomedical Natural Language Processing, pp. 99–107. Association for Computational Linguistics (2010)
Hirohata, K., Okazaki, N., Ananiadou, S., Ishizuka, M.: Identifying sections in scientific abstracts using conditional random fields. In: Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-I (2008)
Kilicoglu, H., Rosemblat, G., Malički, M., ter Riet, G.: Automatic recognition of self-acknowledged limitations in clinical research literature. J. Am. Med. Inform. Assoc. 25(7), 855–861 (2018)
Kim, S.N., Martinez, D., Cavedon, L., Yencken, L.: Automatic classification of sentences to support evidence based medicine. BMC Bioinf. 12, S5 (2011). BioMed Central
Kundi, F.M., Khan, A., Ahmad, S., Asghar, M.Z.: Lexicon-based sentiment analysis in the social web. J. Basic Appl. Sci. Res. 4(6), 238–248 (2014)
Liakata, M., Teufel, S., Siddharthan, A., Batchelor, C.R., et al.: Corpora for the conceptualisation and zoning of scientific papers. In: LREC. Citeseer (2010)
Manning, C., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S., McClosky, D.: The stanford coreNLP natural language processing toolkit. In: Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55–60 (2014)
McHugh, M.L.: Interrater reliability: the kappa statistic. Biochemia medica: Biochemia medica 22(3), 276–282 (2012)
McKnight, L., Srinivasan, P.: Categorization of sentence types in medical abstracts. In: AMIA Annual Symposium Proceedings, vol. 2003, p. 440. American Medical Informatics Association (2003)
Mizuta, Y., Korhonen, A., Mullen, T., Collier, N.: Zone analysis in biology articles as a basis for information extraction. Int. J. Med. Inf. 75(6), 468–487 (2006)
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Ruch, P., et al.: Using argumentation to extract key sentences from biomedical abstracts. Int. J. Med. Inf. 76(2–3), 195–200 (2007)
Teufel, S., Moens, M.: Summarizing scientific articles: experiments with relevance and rhetorical status. Comput. Linguist. 28(4), 409–445 (2002)
Teufel, S., Siddharthan, A., Batchelor, C.: Towards discipline-independent argumentative zoning: evidence from chemistry and computational linguistics. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, vol. 3, pp. 1493–1502. Association for Computational Linguistics (2009)
Yu, H., Hripcsak, G., Friedman, C.: Mapping abbreviations to full forms in biomedical articles. J. Am. Med. Inform. Assoc. 9(3), 262–272 (2002)
Acknowledgement
We would like to thank Shiqi Qu who have contributed to the inter-coder agreement checking and corpus construction.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Li, Y., Yu, B. (2019). Identifying Finding Sentences in Conclusion Subsections of Biomedical Abstracts. In: Taylor, N., Christian-Lamb, C., Martin, M., Nardi, B. (eds) Information in Contemporary Society. iConference 2019. Lecture Notes in Computer Science(), vol 11420. Springer, Cham. https://doi.org/10.1007/978-3-030-15742-5_64
Download citation
DOI: https://doi.org/10.1007/978-3-030-15742-5_64
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-15741-8
Online ISBN: 978-3-030-15742-5
eBook Packages: Computer ScienceComputer Science (R0)