Skip to main content

Towards Privacy-Preserving Evaluation for Information Retrieval Models Over Industry Data Sets

  • Conference paper
  • First Online:
Information Retrieval Technology (AIRS 2017)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10648))

Included in the following conference series:

  • 620 Accesses

Abstract

The development of Information Retrieval (IR) techniques heavily depends on empirical studies over real world data collections. Unfortunately, those real world data sets are often unavailable to researchers due to privacy concerns. In fact, the lack of publicly available industry data sets has become a serious bottleneck hindering IR research. To address this problem, we propose to bridge the gap between academic research and industry data sets through a privacy-preserving evaluation platform. The novelty of the platform lies in its “data-centric” mechanism, where the data sit on a secure server and IR algorithms to be evaluated would be uploaded to the server. The platform will run the codes of the algorithms and return the evaluation results. Preliminary experiments with retrieval models reveal interesting new observations and insights about state of the art retrieval models, demonstrating the value of an industry data set.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Chor, B., Kushilevitz, E., Goldreich, O., Sudan, M.: Private information retrieval. J. ACM 45(6), 965–981 (1998)

    Article  MATH  MathSciNet  Google Scholar 

  2. Fang, H., Tao, T., Zhai, C.: Diagnostic evaluation of information retrieval models. ACM Trans. Inf. Syst. 29(2), 7–42 (2011). http://doi.acm.org/10.1145/1961209.1961210

  3. Fang, H., Wu, H., Yang, P., Zhai, C.: Virlab: a web-based virtual lab for learning and studying information retrieval models. In: Proceedings of the 37th International ACM SIGIR Conference on Research  & Development in Information Retrieval, pp. 1249–1250. SIGIR 2014, NY (2014). http://doi.acm.org/10.1145/2600428.2611178

  4. Fang, H., Zhai, C.: An exploration of axiomatic approaches to information retrieval. In: Proceedings of the SIGIR 2005 (2005)

    Google Scholar 

  5. Fang, H., Zhai, C.: Virlab: a platform for privacy-preserving evaluation for information retrieval models. In: Proceeding of the 1st International Workshop on Privacy-Preserving IR (2014)

    Google Scholar 

  6. Hopfgartner, F., Hanbury, A., Müller, H., Kando, N., Mercer, S., Kalpathy-Cramer, J., Potthast, M., Gollub, T., Krithara, A., Lin, J., Balog, K., Eggel, I.: Report on the evaluation-as-a-service (eaas) expert workshop. SIGIR Forum 49(1), 57–65 (2015). http://doi.acm.org/10.1145/2795403.2795416

  7. Lin, J., Efron, M.: Evaluation as a service for information retrieval. SIGIR Forum 47(2), 8–14 (2013). http://doi.acm.org/10.1145/2568388.2568390

  8. Paik, J.H., Lin, J.: Retrievability in api-based “evaluation as a service”. In: Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval, pp. 91–94. ICTIR 2016, NY (2016). http://doi.acm.org/10.1145/2970398.2970427

  9. Robertson, S., Walker, S., Jones, S., Hancock-Beaulieu, M., Gatford, M.: Okapi at trec-3. In: Proceedings of TREC (1996)

    Google Scholar 

  10. Si, L., Yang, H.: Privacy-preserving ir: when information retrieval meets privacy and security. In: Proceedings of the SIGIR 2014 (2014)

    Google Scholar 

  11. Singhal, A., Buckley, C., Mitra, M.: Pivoted document length normalization. In: Proceedings of the SIGIR 1996 (1996)

    Google Scholar 

  12. Wang, L., Lin, J., Metzler, D.: Learning to efficiently rank. In: Proceedings of SIGIR 2010 (2010)

    Google Scholar 

  13. Yang, P., Fang, H.: A reproducibility study of information retrieval models. In: Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval, pp. 77–86. ICTIR 2016, NY (2016). http://doi.acm.org/10.1145/2970398.2970415

Download references

Acknowledgments

This research was supported by the U.S. National Science Foundation under IIS-1423002.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peilin Yang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yang, P., Zhou, M., Chang, Y., Zhai, C., Fang, H. (2017). Towards Privacy-Preserving Evaluation for Information Retrieval Models Over Industry Data Sets. In: Sung, WK., et al. Information Retrieval Technology. AIRS 2017. Lecture Notes in Computer Science(), vol 10648. Springer, Cham. https://doi.org/10.1007/978-3-319-70145-5_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-70145-5_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-70144-8

  • Online ISBN: 978-3-319-70145-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics