Abstract
The proximity based information retrieval models usually use the same pre-define density function for all of terms in the collection to estimate their influence distribution. In healthcare domain, however, different terms in the same document have different influence distributions, the same term in different documents also has different influence distributions, and the pre-defined density function may not completely match the terms’ actual influence distributions. In this paper, we define a saturated density function to measure the best suitable density function that fits the given term’s influence distribution, and propose a self-adaptive approach on saturated density function building for each term in various circumstance. Particularly, our approach utilizing Gamma process is an unsupervised model with no requirements for external resources. Then, we construct a density based weighting method for the purpose of evaluating the effectiveness of our approach. Finally, we conduct our experiment on five standard CLEF and TREC datasets, and the experimental results show that our approach is promising and outperforms the pre-defined density functions in healthcare retrieval.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Amati, G., Van Rijsbergen, C.J.: Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Trans. Inf. Syst. 20(4), 357–389 (2002)
Beigbeder, M., Mercier, A.: Fuzzy proximity ranking with Boolean queries. In: Fourteenth Text Retrieval Conference, Trec 2005, Gaithersburg, Maryland, November 2005
Clarke, C.L., Cormack, G.V., Burkowski, F.J.: Shortest substring ranking (multitext experiments for TREC-4). In: TREC, vol. 4, pp. 295–304. Citeseer (1995)
Cummins, R., O’Riordan, C., Lalmas, M.: An analysis of learned proximity functions. In: Adaptivity, Personalization and Fusion of Heterogeneous Information (2010)
De Kretser, O., Moffat, A.: Effective document presentation with a locality-based similarity heuristic. In: SIGIR 1999: Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval, Berkeley, 15–19 August 1999, pp. 113–120 (1999)
Gerani, S., Carman, M., Crestani, F.: Aggregation methods for proximity-based opinion retrieval. ACM Trans. Inf. Syst. 30(4), 403–410 (2012)
Hawking, D., Thistlewaite, P.: Proximity operators-so near and yet so far. In: Proceedings of the 4th Text Retrieval Conference, pp. 131–143 (1995)
Hogg, R.V., Craig, A.T.: Introduction to Mathematical Statistics, 5th edn. Englewood Hills, New Jersey (1995)
Keen, E.M.: The use of term position devices in ranked output experiments. J. Doc. 47(1), 1–22 (1991)
Keen, E.M.: Some aspects of proximity searching in text retrieval systems. J. Inf. Sci. 18(2), 89–98 (1992)
Kise, K., Junker, M., Dengel, A., Matsumoto, K.: Passage retrieval based on density distributions of terms and its applications to document retrieval and question answering. In: Dengel, A., Junker, M., Weisbecker, A. (eds.) Reading and Learning. LNCS, vol. 2956, pp. 306–327. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24642-8_17
Lu, X.: Improving search using proximity-based statistics. In: The International ACM SIGIR Conference, pp. 1065–1065 (2015)
Lv, Y., Zhai, C.X.: Positional language models for information retrieval. In: International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 299–306 (2009)
Lv, Y., Zhai, C.: Positional relevance model for pseudo-relevance feedback. In: Proceedings of the 33rd international ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 579–586. ACM (2010)
Mahdabi, P., Gerani, S., Huang, J.X., Crestani, F.: Leveraging conceptual lexicon: query disambiguation using proximity information for patent retrieval. In: International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 113–122 (2013)
Metzler, D., Croft, W.B.: A Markov random field model for term dependencies. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development In information Retrieval, pp. 472–479. ACM (2005)
Miao, J., Huang, J.X., Ye, Z.: Proximity-based Rocchio’s model for pseudo relevance. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 535–544 (2012)
Petkova, D., Croft, W.B.: Proximity-based document representation for named entity retrieval. In: Sixteenth ACM Conference on Information and Knowledge Management, CIKM 2007, Lisbon, pp. 731–740, November 2007
Song, Y., Hu, W., Chen, Q., Hu, Q., He, L.: Enhancing the recurrent neural networks with positional gates for sentence representation. In: Cheng, L., Leung, A.C.S., Ozawa, S. (eds.) ICONIP 2018. LNCS, vol. 11301, pp. 511–521. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-04167-0_46
Tao, T., Zhai, C.X.: An exploration of proximity measures in information retrieval. In: Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2007, Amsterdam, pp. 295–302, July 2007
Zhao, J., Huang, J.X.: An enhanced context-sensitive proximity model for probabilistic information retrieval. In: International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1131–1134 (2014)
Zhao, J., Huang, J.X., He, B.: CRTER: using cross terms to enhance probabilistic information retrieval. In: Proceeding of the International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2011, Beijing, pp. 155–164, July 2011
Acknowledgements
We thank all viewers who provided the thoughtful and constructive comments on this paper. The second author is the corresponding author. This research is supported by the open funds of NPPA Key Laboratory of Publishing Integration Development, ECNUP, the Shanghai Municipal Commission of Economy and Informatization (No. 170513), and Xiaoi Research. The computation is performed in the Supercomputer Center of ECNU.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Song, Y., Hu, W., He, L., Dou, L. (2019). Enhancing the Healthcare Retrieval with a Self-adaptive Saturated Density Function. In: Yang, Q., Zhou, ZH., Gong, Z., Zhang, ML., Huang, SJ. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2019. Lecture Notes in Computer Science(), vol 11439. Springer, Cham. https://doi.org/10.1007/978-3-030-16148-4_39
Download citation
DOI: https://doi.org/10.1007/978-3-030-16148-4_39
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-16147-7
Online ISBN: 978-3-030-16148-4
eBook Packages: Computer ScienceComputer Science (R0)