Abstract
The Chinese medical literature contains a large amount of knowledge. Reducing the effort needed by medical scholars to extract this knowledge requires a literature analysis to identify the key information in each paper. We argue that identifying the sections of a paper would help us filter noise from the paper and increase the accuracy of extracting the experimental findings. In this research in progress, we consider paper section identification as a sentence classification task and apply Conditional Random Fields (CRFs) to tackle the problem. In our model we combine both lexical and structural features to facilitate section identification. Experiments on a human-curated asthma dataset show that our approach achieves a 10%–20% performance improvement over Support Vector Machines (SVMs), and that use of both bag-of-words features and domain lexicons benefit the task.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Li, X., Tong, Y., Wang, W.: MedC: a literature analysis system for chinese medicine research. In: Zheng, X., Zeng, D.D., Chen, H., Leischow, S.J. (eds.) ICSH 2015. LNCS, vol. 9545, pp. 311–320. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-29175-8_29
Ito, T., Shimbo, M., Yamasaki, T., Matsumoto, Y.: Semi-supervised sentence classification for MEDLINE documents. Methods 138, 141–146 (2004)
Zhao, J., Liu, K., Wang, G.: Adding redundant features for CRFs-based sentence sentiment classification. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 117–126. Association for Computational Linguistics (2008)
Naughton, M., Stokes, N., Carthy, J.: Sentence-level event classification in unstructured texts. Inf. Retr. 13, 132–156 (2010). https://doi.org/10.1007/s10791-009-9113-0
Kim, S.N., Martinez, D., Cavedon, L.: Automatic classification of sentences for evidence based medicine. In: Proceedings of the ACM Fourth International Workshop on Data and Text Mining in Biomedical Informatics, pp. 13–22 (2010)
Lui, M.: Feature stacking for sentence classification in evidence-based medicine. In: Proceedings of the Australasian Language Technology Association Workshop 2012, pp. 134–138 (2012)
Angrosh, M.A., Cranefield, S., Stanger, N.: Context identification of sentences in related work sections using a conditional random field: towards intelligent digital libraries. In: Proceedings of the 10th Annual Joint Conference on Digital Libraries, pp. 293–302. ACM (2010)
Hachey, B., Grover, C.: Sequence modelling for sentence classification in a legal summarisation system. In: Proceedings of the 2005 ACM Symposium on Applied Computing, pp. 292–296 (2005)
Kim, Y.: Convolutional neural networks for sentence classification (2014)
Chung, G.Y.: Sentence retrieval for abstracts of randomized controlled trials. BMC Med. Inform. Decis. Mak. 9, 1–13 (2009). https://doi.org/10.1186/1472-6947-9-10
Demner-Fushman, D., Lin, J.: Answering clinical questions with knowledge-based and statistical techniques. Comput. Linguist. 33, 63–103 (2007)
Sutton, C., McCallum, A.: An Introduction to Conditional Random Fields for Relational Learning. In: Introduction to statistical relational learning. MIT Press (2006)
McKnight, L., Srinivasan, P.: Categorization of sentence types in medical abstracts. In: AMIA Annual Symposium Proceedings, pp. 440–444. American Medical Informatics Association (2003)
Yamamoto, Y., Takagi, T.: A sentence classification system for multi biomedical literature summarization. In: Proceedings of the 21st International Conference on Data Engineering, pp. 1163–1168 (2005)
Acknowledgements
The research is partially supported by Digital Innovation Lab at City University of Hong Kong, GuangDong Science and Technology Project 2014A020221090, and the City University of Hong Kong Shenzhen Research Institute.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhou, S., Li, X. (2018). Section Identification to Improve Information Extraction from Chinese Medical Literature. In: Chen, H., Fang, Q., Zeng, D., Wu, J. (eds) Smart Health. ICSH 2018. Lecture Notes in Computer Science(), vol 10983. Springer, Cham. https://doi.org/10.1007/978-3-030-03649-2_34
Download citation
DOI: https://doi.org/10.1007/978-3-030-03649-2_34
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-03648-5
Online ISBN: 978-3-030-03649-2
eBook Packages: Computer ScienceComputer Science (R0)