Section Identification to Improve Information Extraction from Chinese Medical Literature

Zhou, Sijia; Li, Xin

doi:10.1007/978-3-030-03649-2_34

Sijia Zhou^17,18 &
Xin Li^17,18

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10983))

Included in the following conference series:

International Conference on Smart Health

1169 Accesses

Abstract

The Chinese medical literature contains a large amount of knowledge. Reducing the effort needed by medical scholars to extract this knowledge requires a literature analysis to identify the key information in each paper. We argue that identifying the sections of a paper would help us filter noise from the paper and increase the accuracy of extracting the experimental findings. In this research in progress, we consider paper section identification as a sentence classification task and apply Conditional Random Fields (CRFs) to tackle the problem. In our model we combine both lexical and structural features to facilitate section identification. Experiments on a human-curated asthma dataset show that our approach achieves a 10%–20% performance improvement over Support Vector Machines (SVMs), and that use of both bag-of-words features and domain lexicons benefit the task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Identifying Finding Sentences in Conclusion Subsections of Biomedical Abstracts

Section Heading Recognition in Electronic Health Records Using Conditional Random Fields

To Enhance Full-Text Biomedical Document Classification Through Semantic Enrichment

References

Li, X., Tong, Y., Wang, W.: MedC: a literature analysis system for chinese medicine research. In: Zheng, X., Zeng, D.D., Chen, H., Leischow, S.J. (eds.) ICSH 2015. LNCS, vol. 9545, pp. 311–320. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-29175-8_29
Chapter Google Scholar
Ito, T., Shimbo, M., Yamasaki, T., Matsumoto, Y.: Semi-supervised sentence classification for MEDLINE documents. Methods 138, 141–146 (2004)
Google Scholar
Zhao, J., Liu, K., Wang, G.: Adding redundant features for CRFs-based sentence sentiment classification. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 117–126. Association for Computational Linguistics (2008)
Google Scholar
Naughton, M., Stokes, N., Carthy, J.: Sentence-level event classification in unstructured texts. Inf. Retr. 13, 132–156 (2010). https://doi.org/10.1007/s10791-009-9113-0
Article Google Scholar
Kim, S.N., Martinez, D., Cavedon, L.: Automatic classification of sentences for evidence based medicine. In: Proceedings of the ACM Fourth International Workshop on Data and Text Mining in Biomedical Informatics, pp. 13–22 (2010)
Google Scholar
Lui, M.: Feature stacking for sentence classification in evidence-based medicine. In: Proceedings of the Australasian Language Technology Association Workshop 2012, pp. 134–138 (2012)
Google Scholar
Angrosh, M.A., Cranefield, S., Stanger, N.: Context identification of sentences in related work sections using a conditional random field: towards intelligent digital libraries. In: Proceedings of the 10th Annual Joint Conference on Digital Libraries, pp. 293–302. ACM (2010)
Google Scholar
Hachey, B., Grover, C.: Sequence modelling for sentence classification in a legal summarisation system. In: Proceedings of the 2005 ACM Symposium on Applied Computing, pp. 292–296 (2005)
Google Scholar
Kim, Y.: Convolutional neural networks for sentence classification (2014)
Google Scholar
Chung, G.Y.: Sentence retrieval for abstracts of randomized controlled trials. BMC Med. Inform. Decis. Mak. 9, 1–13 (2009). https://doi.org/10.1186/1472-6947-9-10
Article MathSciNet Google Scholar
Demner-Fushman, D., Lin, J.: Answering clinical questions with knowledge-based and statistical techniques. Comput. Linguist. 33, 63–103 (2007)
Article Google Scholar
Sutton, C., McCallum, A.: An Introduction to Conditional Random Fields for Relational Learning. In: Introduction to statistical relational learning. MIT Press (2006)
Google Scholar
McKnight, L., Srinivasan, P.: Categorization of sentence types in medical abstracts. In: AMIA Annual Symposium Proceedings, pp. 440–444. American Medical Informatics Association (2003)
Google Scholar
Yamamoto, Y., Takagi, T.: A sentence classification system for multi biomedical literature summarization. In: Proceedings of the 21st International Conference on Data Engineering, pp. 1163–1168 (2005)
Google Scholar

Download references

Acknowledgements

The research is partially supported by Digital Innovation Lab at City University of Hong Kong, GuangDong Science and Technology Project 2014A020221090, and the City University of Hong Kong Shenzhen Research Institute.

Author information

Authors and Affiliations

Department of Information Systems, City University of Hong Kong, Kowloon, Hong Kong
Sijia Zhou & Xin Li
Shenzhen Research Institute, City University of Hong Kong, Shenzhen, China
Sijia Zhou & Xin Li

Authors

Sijia Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Xin Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sijia Zhou .

Editor information

Editors and Affiliations

University of Arizona, Tucson, AZ, USA
Hsinchun Chen
Wuhan University, Wuhan, China
Qing Fang
University of Arizona, Tucson, AZ, USA
Daniel Zeng
Wuhan University, Wuhan, China
Jiang Wu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhou, S., Li, X. (2018). Section Identification to Improve Information Extraction from Chinese Medical Literature. In: Chen, H., Fang, Q., Zeng, D., Wu, J. (eds) Smart Health. ICSH 2018. Lecture Notes in Computer Science(), vol 10983. Springer, Cham. https://doi.org/10.1007/978-3-030-03649-2_34

Download citation

DOI: https://doi.org/10.1007/978-3-030-03649-2_34
Published: 26 October 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-03648-5
Online ISBN: 978-3-030-03649-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Section Identification to Improve Information Extraction from Chinese Medical Literature

Abstract

Access this chapter

Similar content being viewed by others

Identifying Finding Sentences in Conclusion Subsections of Biomedical Abstracts

Section Heading Recognition in Electronic Health Records Using Conditional Random Fields

To Enhance Full-Text Biomedical Document Classification Through Semantic Enrichment

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Section Identification to Improve Information Extraction from Chinese Medical Literature

Abstract

Access this chapter

Similar content being viewed by others

Identifying Finding Sentences in Conclusion Subsections of Biomedical Abstracts

Section Heading Recognition in Electronic Health Records Using Conditional Random Fields

To Enhance Full-Text Biomedical Document Classification Through Semantic Enrichment

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation