Skip to main content

Estimating Probability Density of Content Types for Promoting Medical Records Search

  • Conference paper
Advances in Information Retrieval (ECIR 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9626))

Included in the following conference series:

Abstract

Disease and symptom in medical records tend to appear in different content types: positive, negative, family history and the others. Traditional information retrieval systems depending on keyword match are often adversely affected by the content types. In this paper, we propose a novel learning approach utilizing the content types as features to improve the medical records search. Particularly, the different contents from the medical records are identified using a Bayesian-based classification method. Then, we introduce our type-based weighting function to take advantage of the content types, in which the weights of the content types are automatically calculated by estimating the probability density functions in the documents. Finally, we evaluate the approach on the TREC 2011 and 2012 Medical Records data sets, in which our experimental results show that our approach is promising and superior.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://terrier.org.

References

  1. Koopman, B., Zuccon, G.: Understanding negation and family history to improve clinical information retrieval. In: Proceedings of the 37th International ACM SIGIR Conference on Research Development in Information Retrieval, pp. 971–974. ACM (2014)

    Google Scholar 

  2. Robertson, S., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Now Publishers Inc, Hanover (2009)

    Google Scholar 

  3. Voorhees, E., Tong, R.: Overview of the trec medical records track. In: Proceedings of TREC 2011 (2011)

    Google Scholar 

  4. Chapman, W.W., Bridewell, W., Hanbury, P., Cooper, G.F., Buchanan, B.G.: A simple algorithm for identifying negated findings and diseases in discharge summaries. J. Biomed. Inform. 34(5), 301–310 (2001)

    Article  Google Scholar 

  5. Harkema, H., Dowling, J.N., Thornblade, T., Chapman, W.W.: Context: an algorithm for determining negation, experiencer, and temporal status from clinical reports. J. Biomed. Inform. 42(5), 839–851 (2009)

    Article  Google Scholar 

  6. Averbuch, M., Karson, T., Ben-Ami, B., Maimon, O., Rokach, L.: Context-sensitive medical information retrieval. In: The 11th World Congress on Medical Informatics (MEDINFO 2004), San Francisco, CA, pp. 282–286. Citeseer (2004)

    Google Scholar 

  7. Limsopatham, N., Macdonald, C., McCreadie, R., Ounis, I.: Exploiting term dependence while handling negation in medical search. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1065–1066. ACM (2012)

    Google Scholar 

  8. Karimi, S., Martinez, D., Ghodke, S., Cavedon, L., Suominen, H., Zhang, L.: Search for medical records: Nicta at trec medical track. In: TREC 2011 (2011)

    Google Scholar 

  9. Amini, I., Sanderson, M., Martinez, D., Li, X.: Search for clinical records: rmit at trec medical track. In: Proceedings of the twentieth Text Retrieval Conference (TREC 2011). Citeseer (2011)

    Google Scholar 

  10. Córdoba, J.M., López, M.J.M., Díaz, N.P.C., Vázquez, J.M., Aparicio, F., de Buenaga Rodríguez, M., Glez-Peña, D., Fdez-Riverola, F.: Medical-miner at trec medical records track. In: TREC 2011 (2011)

    Google Scholar 

  11. King, B., Wang, L., Provalov, I., Zhou, J.: Cengage learning at trec medical track. In: TREC 2011 (2011)

    Google Scholar 

  12. Limsopatham, N., Macdonald, C., Ounis, I., McDonald, G., Bouamrane, M.: University of glasgow at medical records track: experiments with terrier. In: Proceedings of TREC 2011 (2011)

    Google Scholar 

  13. Zhou, X., Huang, J.X., He, B.: Enhancing ad-hoc relevance weighting using probability density estimation. In: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval, pp. 175–184 (2011)

    Google Scholar 

  14. Choi, S., Choi, J.: Exploring effective information retrieval technique for the medical web documents: Snumedinfo at clefehealth task 3. In: Proceedings of the ShARe/CLEF eHealth Evaluation Lab 2014 (2014)

    Google Scholar 

  15. Robertson, S.E.: The probability ranking principle in IR. J. Document. 33, 294–304 (1977)

    Article  Google Scholar 

  16. Gijbels, I., Delaigle, A.: Practical bandwidth selection in deconvolution kernel density estimation. Comput. Stat. Data Anal. 45(2), 249–267 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  17. Duraiswami, V.: Abstract fast optimal bandwidth selection for kernel density estimation. Fast optimal bandwidth selection for kernel density estimation. - ResearchGate (2006)

    Google Scholar 

  18. Jones, M.C.: A brief survey of bandwidth selection for density estimation. J. Am. Stat. Assoc. 91(433), 401–407 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  19. Comaniciu, D.: An algorithm for data-driven bandwidth selection. IEEE Trans. Pattern Anal. Mach. Intell. 25, 281–288 (2003)

    Article  Google Scholar 

Download references

Acknowledgment

This research is funded by the Science and Technology Commission of Shanghai Municipality (No.15PJ1401700 and No.14511106803).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qinmin Hu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

He, Y., Hu, Q., Song, Y., He, L. (2016). Estimating Probability Density of Content Types for Promoting Medical Records Search. In: Ferro, N., et al. Advances in Information Retrieval. ECIR 2016. Lecture Notes in Computer Science(), vol 9626. Springer, Cham. https://doi.org/10.1007/978-3-319-30671-1_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-30671-1_19

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-30670-4

  • Online ISBN: 978-3-319-30671-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics