Abstract
Disease and symptom in medical records tend to appear in different content types: positive, negative, family history and the others. Traditional information retrieval systems depending on keyword match are often adversely affected by the content types. In this paper, we propose a novel learning approach utilizing the content types as features to improve the medical records search. Particularly, the different contents from the medical records are identified using a Bayesian-based classification method. Then, we introduce our type-based weighting function to take advantage of the content types, in which the weights of the content types are automatically calculated by estimating the probability density functions in the documents. Finally, we evaluate the approach on the TREC 2011 and 2012 Medical Records data sets, in which our experimental results show that our approach is promising and superior.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
References
Koopman, B., Zuccon, G.: Understanding negation and family history to improve clinical information retrieval. In: Proceedings of the 37th International ACM SIGIR Conference on Research Development in Information Retrieval, pp. 971–974. ACM (2014)
Robertson, S., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Now Publishers Inc, Hanover (2009)
Voorhees, E., Tong, R.: Overview of the trec medical records track. In: Proceedings of TREC 2011 (2011)
Chapman, W.W., Bridewell, W., Hanbury, P., Cooper, G.F., Buchanan, B.G.: A simple algorithm for identifying negated findings and diseases in discharge summaries. J. Biomed. Inform. 34(5), 301–310 (2001)
Harkema, H., Dowling, J.N., Thornblade, T., Chapman, W.W.: Context: an algorithm for determining negation, experiencer, and temporal status from clinical reports. J. Biomed. Inform. 42(5), 839–851 (2009)
Averbuch, M., Karson, T., Ben-Ami, B., Maimon, O., Rokach, L.: Context-sensitive medical information retrieval. In: The 11th World Congress on Medical Informatics (MEDINFO 2004), San Francisco, CA, pp. 282–286. Citeseer (2004)
Limsopatham, N., Macdonald, C., McCreadie, R., Ounis, I.: Exploiting term dependence while handling negation in medical search. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1065–1066. ACM (2012)
Karimi, S., Martinez, D., Ghodke, S., Cavedon, L., Suominen, H., Zhang, L.: Search for medical records: Nicta at trec medical track. In: TREC 2011 (2011)
Amini, I., Sanderson, M., Martinez, D., Li, X.: Search for clinical records: rmit at trec medical track. In: Proceedings of the twentieth Text Retrieval Conference (TREC 2011). Citeseer (2011)
Córdoba, J.M., López, M.J.M., DÃaz, N.P.C., Vázquez, J.M., Aparicio, F., de Buenaga RodrÃguez, M., Glez-Peña, D., Fdez-Riverola, F.: Medical-miner at trec medical records track. In: TREC 2011 (2011)
King, B., Wang, L., Provalov, I., Zhou, J.: Cengage learning at trec medical track. In: TREC 2011 (2011)
Limsopatham, N., Macdonald, C., Ounis, I., McDonald, G., Bouamrane, M.: University of glasgow at medical records track: experiments with terrier. In: Proceedings of TREC 2011 (2011)
Zhou, X., Huang, J.X., He, B.: Enhancing ad-hoc relevance weighting using probability density estimation. In: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval, pp. 175–184 (2011)
Choi, S., Choi, J.: Exploring effective information retrieval technique for the medical web documents: Snumedinfo at clefehealth task 3. In: Proceedings of the ShARe/CLEF eHealth Evaluation Lab 2014 (2014)
Robertson, S.E.: The probability ranking principle in IR. J. Document. 33, 294–304 (1977)
Gijbels, I., Delaigle, A.: Practical bandwidth selection in deconvolution kernel density estimation. Comput. Stat. Data Anal. 45(2), 249–267 (2004)
Duraiswami, V.: Abstract fast optimal bandwidth selection for kernel density estimation. Fast optimal bandwidth selection for kernel density estimation. - ResearchGate (2006)
Jones, M.C.: A brief survey of bandwidth selection for density estimation. J. Am. Stat. Assoc. 91(433), 401–407 (1996)
Comaniciu, D.: An algorithm for data-driven bandwidth selection. IEEE Trans. Pattern Anal. Mach. Intell. 25, 281–288 (2003)
Acknowledgment
This research is funded by the Science and Technology Commission of Shanghai Municipality (No.15PJ1401700 and No.14511106803).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
He, Y., Hu, Q., Song, Y., He, L. (2016). Estimating Probability Density of Content Types for Promoting Medical Records Search. In: Ferro, N., et al. Advances in Information Retrieval. ECIR 2016. Lecture Notes in Computer Science(), vol 9626. Springer, Cham. https://doi.org/10.1007/978-3-319-30671-1_19
Download citation
DOI: https://doi.org/10.1007/978-3-319-30671-1_19
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-30670-4
Online ISBN: 978-3-319-30671-1
eBook Packages: Computer ScienceComputer Science (R0)