Skip to main content

Result Set Diversification in Digital Libraries Through the Use of Paper’s Claims

  • Conference paper
  • First Online:
Digital Libraries: Data, Information, and Knowledge for Digital Lives (ICADL 2017)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10647))

Included in the following conference series:

Abstract

Understanding the possible associations between two entities from a query is a hard problem. For instance, querying “coffee” and “cancer” even in a curated Digital Library is a challenge to the retrieval system that struggles to figure out the intention of the query. Maybe the user wants a consensus of what it is known? But how many different associations exist? How to find them all? Herein we introduce an approach to diversify the results retrieved from such queries aiming at re-ranking the result list. Our re-ranking models specifically one fundamental aspect of scientific papers: claims. Claims are the sentences that scientists use to report findings. In particular, we study claims that express associations between entities in the medical domain. More specifically, we focus on queries that involve two entities in which one of the entities has some effect on a disease. Thus, we work on a corpus obtained by querying PubMed to empirically assess our proposed solution. Moreover, we promote the idea of claims as an explicit key aspect to consider diversification in the result set of a query. We show the potential of our approach to ease the process of discovering representative associations between entities. Our approach relies on a representation of claims using neural embedding of word vectors and implements an algorithm to perform the re-ranking of the result set of a query. We empirically show the potential of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. White, R.: Beliefs and biases in web search. In: Proceedings of 36th International ACM SIGIR conference on research and development in Information Retrieval - SIGIR 2013, p. 3 (2013)

    Google Scholar 

  2. Schoenfeld, J.D.: Is everything we eat associated with cancer? A systematic. Am. J. Clin. Nutr. 97, 127–134 (2013)

    Article  Google Scholar 

  3. Agrawal, R., Gollapudi, S., Halverson, A., Ieong, S.: Diversifying search results. In: Proceedings of the Second ACM International Conference on Web Search and Data Mining - WSDM 2009, p. 5 (2009)

    Google Scholar 

  4. Chávez, E., Navarro, G.: A compact space decomposition for effective metric indexing. Pattern Recognit. Lett. 26, 1363–1376 (2005)

    Article  Google Scholar 

  5. Gil-Costa, V., Santos, R.L.T., MacDonald, C., Ounis, I.: Modelling efficient novelty-based search result diversification in metric spaces. J. Discret. Algorithms 18, 75–88 (2013)

    Article  MATH  MathSciNet  Google Scholar 

  6. Ieong, S., Mishra, N., Sadikov, E., Zhang, L.: Domain bias in web search. In: WSDM 2012 Proceedings of Fifth ACM International Conference on Web Search and Data Mining, pp. 413–422 (2012)

    Google Scholar 

  7. Santos, R.L.T.T., Macdonald, C., Ounis, I.: Exploiting query reformulations for web search result diversification. In: Proceedings of 19th International Conference on World Wide Web, pp. 881–890 (2010)

    Google Scholar 

  8. Carbonell, J., Goldstein, J.: The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval - SIGIR 1998, pp. 335–336 (1998)

    Google Scholar 

  9. Zhai, C.X., Cohen, W.W., Lafferty, J.: Beyond independent relevance: methods and evaluation metrics for subtopic retrieval. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, pp. 10–17 (2003)

    Google Scholar 

  10. He, J., Meij, E., De Rijke, M.: Result diversification based on query-specific cluster ranking. J. Am. Soc. Inf. Sci. Technol. 62, 550–571 (2011)

    Article  Google Scholar 

  11. Carpineto, C., D’Amico, M., Romano, G.: Evaluating subtopic retrieval methods: clustering versus diversification of search results. Inf. Process. Manag. 48, 358–373 (2012)

    Article  Google Scholar 

  12. Chen, X., Wang, H., Sun, X., Pan, J., Yu, Y.: Diversifying product search results. In: SIGIR, pp. 1093–1094 (2011)

    Google Scholar 

  13. Radlinski, F., Dumais, S.: Improving personalized web search using result diversification. In: Proc. 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2006, p. 691 (2006)

    Google Scholar 

  14. Mikolov, T., Corrado, G., Chen, K., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of International Conference on Learning Representation (ICLR 2013), pp. 1–12 (2013)

    Google Scholar 

  15. Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: International Conference on Machine Learning - ICML 2014, vol. 32, pp. 1188–1196 (2014)

    Google Scholar 

  16. Pyysalo, S., Ginter, F., Moen, H., Salakoski, T., Ananiadou, S.: Distributional semantics resources for biomedical text processing. In: Proceedings of LBM 2013 (2013)

    Google Scholar 

  17. Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)

    Google Scholar 

  18. Kusner, M.J., Sun, Y., Kolkin, N.I., Weinberger, K.Q.: From word embeddings to document distances. In: Proceedings of 32nd International Conference on Machine Learning, vol. 37, pp. 957–966 (2015)

    Google Scholar 

  19. Hawking, D.: Overview of the TREC-9 web track. In: NIST Special Publication 500-249: The Ninth Text REtrieval Conference (TREC-9), pp. 87–102 (2001)

    Google Scholar 

  20. Manning, C.D., Raghavan, P.: An introduction to information retrieval (2009). http://dspace.cusat.ac.in/dspace/handle/123456789/2538

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to José María González Pinto .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

González Pinto, J.M., Balke, WT. (2017). Result Set Diversification in Digital Libraries Through the Use of Paper’s Claims. In: Choemprayong, S., Crestani, F., Cunningham, S. (eds) Digital Libraries: Data, Information, and Knowledge for Digital Lives. ICADL 2017. Lecture Notes in Computer Science(), vol 10647. Springer, Cham. https://doi.org/10.1007/978-3-319-70232-2_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-70232-2_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-70231-5

  • Online ISBN: 978-3-319-70232-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics