Skip to main content

Enhancing the Healthcare Retrieval with a Self-adaptive Saturated Density Function

  • Conference paper
  • First Online:
  • 2699 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11439))

Abstract

The proximity based information retrieval models usually use the same pre-define density function for all of terms in the collection to estimate their influence distribution. In healthcare domain, however, different terms in the same document have different influence distributions, the same term in different documents also has different influence distributions, and the pre-defined density function may not completely match the terms’ actual influence distributions. In this paper, we define a saturated density function to measure the best suitable density function that fits the given term’s influence distribution, and propose a self-adaptive approach on saturated density function building for each term in various circumstance. Particularly, our approach utilizing Gamma process is an unsupervised model with no requirements for external resources. Then, we construct a density based weighting method for the purpose of evaluating the effectiveness of our approach. Finally, we conduct our experiment on five standard CLEF and TREC datasets, and the experimental results show that our approach is promising and outperforms the pre-defined density functions in healthcare retrieval.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Amati, G., Van Rijsbergen, C.J.: Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Trans. Inf. Syst. 20(4), 357–389 (2002)

    Article  Google Scholar 

  2. Beigbeder, M., Mercier, A.: Fuzzy proximity ranking with Boolean queries. In: Fourteenth Text Retrieval Conference, Trec 2005, Gaithersburg, Maryland, November 2005

    Google Scholar 

  3. Clarke, C.L., Cormack, G.V., Burkowski, F.J.: Shortest substring ranking (multitext experiments for TREC-4). In: TREC, vol. 4, pp. 295–304. Citeseer (1995)

    Google Scholar 

  4. Cummins, R., O’Riordan, C., Lalmas, M.: An analysis of learned proximity functions. In: Adaptivity, Personalization and Fusion of Heterogeneous Information (2010)

    Google Scholar 

  5. De Kretser, O., Moffat, A.: Effective document presentation with a locality-based similarity heuristic. In: SIGIR 1999: Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval, Berkeley, 15–19 August 1999, pp. 113–120 (1999)

    Google Scholar 

  6. Gerani, S., Carman, M., Crestani, F.: Aggregation methods for proximity-based opinion retrieval. ACM Trans. Inf. Syst. 30(4), 403–410 (2012)

    Article  Google Scholar 

  7. Hawking, D., Thistlewaite, P.: Proximity operators-so near and yet so far. In: Proceedings of the 4th Text Retrieval Conference, pp. 131–143 (1995)

    Google Scholar 

  8. Hogg, R.V., Craig, A.T.: Introduction to Mathematical Statistics, 5th edn. Englewood Hills, New Jersey (1995)

    MATH  Google Scholar 

  9. Keen, E.M.: The use of term position devices in ranked output experiments. J. Doc. 47(1), 1–22 (1991)

    Article  MathSciNet  Google Scholar 

  10. Keen, E.M.: Some aspects of proximity searching in text retrieval systems. J. Inf. Sci. 18(2), 89–98 (1992)

    Article  Google Scholar 

  11. Kise, K., Junker, M., Dengel, A., Matsumoto, K.: Passage retrieval based on density distributions of terms and its applications to document retrieval and question answering. In: Dengel, A., Junker, M., Weisbecker, A. (eds.) Reading and Learning. LNCS, vol. 2956, pp. 306–327. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24642-8_17

    Chapter  Google Scholar 

  12. Lu, X.: Improving search using proximity-based statistics. In: The International ACM SIGIR Conference, pp. 1065–1065 (2015)

    Google Scholar 

  13. Lv, Y., Zhai, C.X.: Positional language models for information retrieval. In: International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 299–306 (2009)

    Google Scholar 

  14. Lv, Y., Zhai, C.: Positional relevance model for pseudo-relevance feedback. In: Proceedings of the 33rd international ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 579–586. ACM (2010)

    Google Scholar 

  15. Mahdabi, P., Gerani, S., Huang, J.X., Crestani, F.: Leveraging conceptual lexicon: query disambiguation using proximity information for patent retrieval. In: International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 113–122 (2013)

    Google Scholar 

  16. Metzler, D., Croft, W.B.: A Markov random field model for term dependencies. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development In information Retrieval, pp. 472–479. ACM (2005)

    Google Scholar 

  17. Miao, J., Huang, J.X., Ye, Z.: Proximity-based Rocchio’s model for pseudo relevance. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 535–544 (2012)

    Google Scholar 

  18. Petkova, D., Croft, W.B.: Proximity-based document representation for named entity retrieval. In: Sixteenth ACM Conference on Information and Knowledge Management, CIKM 2007, Lisbon, pp. 731–740, November 2007

    Google Scholar 

  19. Song, Y., Hu, W., Chen, Q., Hu, Q., He, L.: Enhancing the recurrent neural networks with positional gates for sentence representation. In: Cheng, L., Leung, A.C.S., Ozawa, S. (eds.) ICONIP 2018. LNCS, vol. 11301, pp. 511–521. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-04167-0_46

    Chapter  Google Scholar 

  20. Tao, T., Zhai, C.X.: An exploration of proximity measures in information retrieval. In: Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2007, Amsterdam, pp. 295–302, July 2007

    Google Scholar 

  21. Zhao, J., Huang, J.X.: An enhanced context-sensitive proximity model for probabilistic information retrieval. In: International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1131–1134 (2014)

    Google Scholar 

  22. Zhao, J., Huang, J.X., He, B.: CRTER: using cross terms to enhance probabilistic information retrieval. In: Proceeding of the International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2011, Beijing, pp. 155–164, July 2011

    Google Scholar 

Download references

Acknowledgements

We thank all viewers who provided the thoughtful and constructive comments on this paper. The second author is the corresponding author. This research is supported by the open funds of NPPA Key Laboratory of Publishing Integration Development, ECNUP, the Shanghai Municipal Commission of Economy and Informatization (No. 170513), and Xiaoi Research. The computation is performed in the Supercomputer Center of ECNU.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yang Song .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Song, Y., Hu, W., He, L., Dou, L. (2019). Enhancing the Healthcare Retrieval with a Self-adaptive Saturated Density Function. In: Yang, Q., Zhou, ZH., Gong, Z., Zhang, ML., Huang, SJ. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2019. Lecture Notes in Computer Science(), vol 11439. Springer, Cham. https://doi.org/10.1007/978-3-030-16148-4_39

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-16148-4_39

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-16147-7

  • Online ISBN: 978-3-030-16148-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics