Enhancing the Healthcare Retrieval with a Self-adaptive Saturated Density Function

Song, Yang; Hu, Wenxin; He, Liang; Dou, Liang

doi:10.1007/978-3-030-16148-4_39

Enhancing the Healthcare Retrieval with a Self-adaptive Saturated Density Function

Yang Song¹⁹,
Wenxin Hu¹⁹,
Liang He^19,20 &
…
Liang Dou^19,20

Conference paper
First Online: 22 March 2019

2699 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11439))

Abstract

The proximity based information retrieval models usually use the same pre-define density function for all of terms in the collection to estimate their influence distribution. In healthcare domain, however, different terms in the same document have different influence distributions, the same term in different documents also has different influence distributions, and the pre-defined density function may not completely match the terms’ actual influence distributions. In this paper, we define a saturated density function to measure the best suitable density function that fits the given term’s influence distribution, and propose a self-adaptive approach on saturated density function building for each term in various circumstance. Particularly, our approach utilizing Gamma process is an unsupervised model with no requirements for external resources. Then, we construct a density based weighting method for the purpose of evaluating the effectiveness of our approach. Finally, we conduct our experiment on five standard CLEF and TREC datasets, and the experimental results show that our approach is promising and outperforms the pre-defined density functions in healthcare retrieval.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Amati, G., Van Rijsbergen, C.J.: Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Trans. Inf. Syst. 20(4), 357–389 (2002)
Article Google Scholar
Beigbeder, M., Mercier, A.: Fuzzy proximity ranking with Boolean queries. In: Fourteenth Text Retrieval Conference, Trec 2005, Gaithersburg, Maryland, November 2005
Google Scholar
Clarke, C.L., Cormack, G.V., Burkowski, F.J.: Shortest substring ranking (multitext experiments for TREC-4). In: TREC, vol. 4, pp. 295–304. Citeseer (1995)
Google Scholar
Cummins, R., O’Riordan, C., Lalmas, M.: An analysis of learned proximity functions. In: Adaptivity, Personalization and Fusion of Heterogeneous Information (2010)
Google Scholar
De Kretser, O., Moffat, A.: Effective document presentation with a locality-based similarity heuristic. In: SIGIR 1999: Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval, Berkeley, 15–19 August 1999, pp. 113–120 (1999)
Google Scholar
Gerani, S., Carman, M., Crestani, F.: Aggregation methods for proximity-based opinion retrieval. ACM Trans. Inf. Syst. 30(4), 403–410 (2012)
Article Google Scholar
Hawking, D., Thistlewaite, P.: Proximity operators-so near and yet so far. In: Proceedings of the 4th Text Retrieval Conference, pp. 131–143 (1995)
Google Scholar
Hogg, R.V., Craig, A.T.: Introduction to Mathematical Statistics, 5th edn. Englewood Hills, New Jersey (1995)
MATH Google Scholar
Keen, E.M.: The use of term position devices in ranked output experiments. J. Doc. 47(1), 1–22 (1991)
Article MathSciNet Google Scholar
Keen, E.M.: Some aspects of proximity searching in text retrieval systems. J. Inf. Sci. 18(2), 89–98 (1992)
Article Google Scholar
Kise, K., Junker, M., Dengel, A., Matsumoto, K.: Passage retrieval based on density distributions of terms and its applications to document retrieval and question answering. In: Dengel, A., Junker, M., Weisbecker, A. (eds.) Reading and Learning. LNCS, vol. 2956, pp. 306–327. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24642-8_17
Chapter Google Scholar
Lu, X.: Improving search using proximity-based statistics. In: The International ACM SIGIR Conference, pp. 1065–1065 (2015)
Google Scholar
Lv, Y., Zhai, C.X.: Positional language models for information retrieval. In: International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 299–306 (2009)
Google Scholar
Lv, Y., Zhai, C.: Positional relevance model for pseudo-relevance feedback. In: Proceedings of the 33rd international ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 579–586. ACM (2010)
Google Scholar
Mahdabi, P., Gerani, S., Huang, J.X., Crestani, F.: Leveraging conceptual lexicon: query disambiguation using proximity information for patent retrieval. In: International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 113–122 (2013)
Google Scholar
Metzler, D., Croft, W.B.: A Markov random field model for term dependencies. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development In information Retrieval, pp. 472–479. ACM (2005)
Google Scholar
Miao, J., Huang, J.X., Ye, Z.: Proximity-based Rocchio’s model for pseudo relevance. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 535–544 (2012)
Google Scholar
Petkova, D., Croft, W.B.: Proximity-based document representation for named entity retrieval. In: Sixteenth ACM Conference on Information and Knowledge Management, CIKM 2007, Lisbon, pp. 731–740, November 2007
Google Scholar
Song, Y., Hu, W., Chen, Q., Hu, Q., He, L.: Enhancing the recurrent neural networks with positional gates for sentence representation. In: Cheng, L., Leung, A.C.S., Ozawa, S. (eds.) ICONIP 2018. LNCS, vol. 11301, pp. 511–521. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-04167-0_46
Chapter Google Scholar
Tao, T., Zhai, C.X.: An exploration of proximity measures in information retrieval. In: Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2007, Amsterdam, pp. 295–302, July 2007
Google Scholar
Zhao, J., Huang, J.X.: An enhanced context-sensitive proximity model for probabilistic information retrieval. In: International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1131–1134 (2014)
Google Scholar
Zhao, J., Huang, J.X., He, B.: CRTER: using cross terms to enhance probabilistic information retrieval. In: Proceeding of the International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2011, Beijing, pp. 155–164, July 2011
Google Scholar

Download references

Acknowledgements

We thank all viewers who provided the thoughtful and constructive comments on this paper. The second author is the corresponding author. This research is supported by the open funds of NPPA Key Laboratory of Publishing Integration Development, ECNUP, the Shanghai Municipal Commission of Economy and Informatization (No. 170513), and Xiaoi Research. The computation is performed in the Supercomputer Center of ECNU.

Author information

Authors and Affiliations

Department of Computer Science and Technology, East China Normal University, Shanghai, 200241, China
Yang Song, Wenxin Hu, Liang He & Liang Dou
NPPA Key Laboratory of Publishing Intergration Development, ECNUP, Shanghai, China
Liang He & Liang Dou

Authors

Yang Song
View author publications
You can also search for this author in PubMed Google Scholar
Wenxin Hu
View author publications
You can also search for this author in PubMed Google Scholar
Liang He
View author publications
You can also search for this author in PubMed Google Scholar
Liang Dou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yang Song .

Editor information

Editors and Affiliations

Hong Kong University of Science and Technology, Hong Kong, China
Qiang Yang
Nanjing University, Nanjing, China
Zhi-Hua Zhou
University of Macau, Taipa, Macau, China
Zhiguo Gong
Southeast University, Nanjing, China
Min-Ling Zhang
Nanjing University of Aeronautics and Astronautics, Nanjing, China
Sheng-Jun Huang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Song, Y., Hu, W., He, L., Dou, L. (2019). Enhancing the Healthcare Retrieval with a Self-adaptive Saturated Density Function. In: Yang, Q., Zhou, ZH., Gong, Z., Zhang, ML., Huang, SJ. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2019. Lecture Notes in Computer Science(), vol 11439. Springer, Cham. https://doi.org/10.1007/978-3-030-16148-4_39

Download citation

DOI: https://doi.org/10.1007/978-3-030-16148-4_39
Published: 22 March 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-16147-7
Online ISBN: 978-3-030-16148-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics