Estimating Probability Density of Content Types for Promoting Medical Records Search

He, Yun; Hu, Qinmin; Song, Yang; He, Liang

doi:10.1007/978-3-319-30671-1_19

Yun He²²,
Qinmin Hu^21,22,
Yang Song²² &
…
Liang He^21,22

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9626))

Included in the following conference series:

European Conference on Information Retrieval

4300 Accesses
3 Citations

Abstract

Disease and symptom in medical records tend to appear in different content types: positive, negative, family history and the others. Traditional information retrieval systems depending on keyword match are often adversely affected by the content types. In this paper, we propose a novel learning approach utilizing the content types as features to improve the medical records search. Particularly, the different contents from the medical records are identified using a Bayesian-based classification method. Then, we introduce our type-based weighting function to take advantage of the content types, in which the weights of the content types are automatically calculated by estimating the probability density functions in the documents. Finally, we evaluate the approach on the TREC 2011 and 2012 Medical Records data sets, in which our experimental results show that our approach is promising and superior.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://terrier.org.

References

Koopman, B., Zuccon, G.: Understanding negation and family history to improve clinical information retrieval. In: Proceedings of the 37th International ACM SIGIR Conference on Research Development in Information Retrieval, pp. 971–974. ACM (2014)
Google Scholar
Robertson, S., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Now Publishers Inc, Hanover (2009)
Google Scholar
Voorhees, E., Tong, R.: Overview of the trec medical records track. In: Proceedings of TREC 2011 (2011)
Google Scholar
Chapman, W.W., Bridewell, W., Hanbury, P., Cooper, G.F., Buchanan, B.G.: A simple algorithm for identifying negated findings and diseases in discharge summaries. J. Biomed. Inform. 34(5), 301–310 (2001)
Article Google Scholar
Harkema, H., Dowling, J.N., Thornblade, T., Chapman, W.W.: Context: an algorithm for determining negation, experiencer, and temporal status from clinical reports. J. Biomed. Inform. 42(5), 839–851 (2009)
Article Google Scholar
Averbuch, M., Karson, T., Ben-Ami, B., Maimon, O., Rokach, L.: Context-sensitive medical information retrieval. In: The 11th World Congress on Medical Informatics (MEDINFO 2004), San Francisco, CA, pp. 282–286. Citeseer (2004)
Google Scholar
Limsopatham, N., Macdonald, C., McCreadie, R., Ounis, I.: Exploiting term dependence while handling negation in medical search. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1065–1066. ACM (2012)
Google Scholar
Karimi, S., Martinez, D., Ghodke, S., Cavedon, L., Suominen, H., Zhang, L.: Search for medical records: Nicta at trec medical track. In: TREC 2011 (2011)
Google Scholar
Amini, I., Sanderson, M., Martinez, D., Li, X.: Search for clinical records: rmit at trec medical track. In: Proceedings of the twentieth Text Retrieval Conference (TREC 2011). Citeseer (2011)
Google Scholar
Córdoba, J.M., López, M.J.M., Díaz, N.P.C., Vázquez, J.M., Aparicio, F., de Buenaga Rodríguez, M., Glez-Peña, D., Fdez-Riverola, F.: Medical-miner at trec medical records track. In: TREC 2011 (2011)
Google Scholar
King, B., Wang, L., Provalov, I., Zhou, J.: Cengage learning at trec medical track. In: TREC 2011 (2011)
Google Scholar
Limsopatham, N., Macdonald, C., Ounis, I., McDonald, G., Bouamrane, M.: University of glasgow at medical records track: experiments with terrier. In: Proceedings of TREC 2011 (2011)
Google Scholar
Zhou, X., Huang, J.X., He, B.: Enhancing ad-hoc relevance weighting using probability density estimation. In: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval, pp. 175–184 (2011)
Google Scholar
Choi, S., Choi, J.: Exploring effective information retrieval technique for the medical web documents: Snumedinfo at clefehealth task 3. In: Proceedings of the ShARe/CLEF eHealth Evaluation Lab 2014 (2014)
Google Scholar
Robertson, S.E.: The probability ranking principle in IR. J. Document. 33, 294–304 (1977)
Article Google Scholar
Gijbels, I., Delaigle, A.: Practical bandwidth selection in deconvolution kernel density estimation. Comput. Stat. Data Anal. 45(2), 249–267 (2004)
Article MathSciNet MATH Google Scholar
Duraiswami, V.: Abstract fast optimal bandwidth selection for kernel density estimation. Fast optimal bandwidth selection for kernel density estimation. - ResearchGate (2006)
Google Scholar
Jones, M.C.: A brief survey of bandwidth selection for density estimation. J. Am. Stat. Assoc. 91(433), 401–407 (1996)
Article MathSciNet MATH Google Scholar
Comaniciu, D.: An algorithm for data-driven bandwidth selection. IEEE Trans. Pattern Anal. Mach. Intell. 25, 281–288 (2003)
Article Google Scholar

Download references

Acknowledgment

This research is funded by the Science and Technology Commission of Shanghai Municipality (No.15PJ1401700 and No.14511106803).

Author information

Authors and Affiliations

Shanghai Key Laboratory of Multidimensional Information Processing, East China Normal University, Shanghai, 200241, China
Qinmin Hu & Liang He
Department of Computer Science and Technology, East China Normal University, Shanghai, 200241, China
Yun He, Qinmin Hu, Yang Song & Liang He

Authors

Yun He
View author publications
You can also search for this author in PubMed Google Scholar
Qinmin Hu
View author publications
You can also search for this author in PubMed Google Scholar
Yang Song
View author publications
You can also search for this author in PubMed Google Scholar
Liang He
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qinmin Hu .

Editor information

Editors and Affiliations

Department of Information Engineering, University of Padua, Padova, Italy
Nicola Ferro
Faculty of Informatics, University of Lugano (USI), Lugano, Switzerland
Fabio Crestani
Department of Computer Science, Katholieke Universiteit Leuven, Heverlee, Belgium
Marie-Francine Moens
Systèmes d’informations, Big Data et Recherche d’Information, Institut de Recherche en Informatique de Toulouse IRIT/équipe SIG, Toulouse Cedex 04, France
Josiane Mothe
Yahoo! Labs London, London, UK
Fabrizio Silvestri
Department of Information Engineering, University of Padua, Padova, Italy
Giorgio Maria Di Nunzio
TU Delft - EWI/ST/WIS, Delft, The Netherlands
Claudia Hauff
Department of Information Engineering, University of Padua, Padova, Italy
Gianmaria Silvello

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

He, Y., Hu, Q., Song, Y., He, L. (2016). Estimating Probability Density of Content Types for Promoting Medical Records Search. In: Ferro, N., et al. Advances in Information Retrieval. ECIR 2016. Lecture Notes in Computer Science(), vol 9626. Springer, Cham. https://doi.org/10.1007/978-3-319-30671-1_19

Download citation

DOI: https://doi.org/10.1007/978-3-319-30671-1_19
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-30670-4
Online ISBN: 978-3-319-30671-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics