Skip to main content

What Should I Cite? Cross-Collection Reference Recommendation of Patents and Papers

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10450))

Abstract

Research results manifest in large corpora of patents and scientific papers. However, both corpora lack a consistent taxonomy and references across different document types are sparse. Therefore, and because of contrastive, domain-specific language, recommending similar papers for a given patent (or vice versa) is challenging.

We propose a recommender system that leverages topic distributions and keywords to recommend related work despite these challenges. As a case study, we evaluate our approach on patents and papers of two fields: medical and computer science. We find that topic-based recommenders complement word-based recommenders for documents with collection-specific language and increase mean average precision by up to 27%. As a result of our work, publications from both corpora form a joint digital library, which connects academia and industry.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://www.wipo.int/edocs/pubdocs/en/wipo_pub_941_2016.pdf.

  2. 2.

    https://scholar.google.com.

  3. 3.

    Larger keyword vectors increase runtime but do not improve result quality.

  4. 4.

    https://www.elastic.co/products/elasticsearch.

  5. 5.

    parameters set as suggested in the original paper: \(\beta =0.01\), \(\delta =0.01\), \(\gamma _1=1\), \(\gamma _2=1\).

  6. 6.

    https://bulkdata.uspto.gov/ and https://aminer.org/citation.

  7. 7.

    https://exporter.nih.gov/.

References

  1. Bornmann, L., Mutz, R.: Growth rates of modern science: a bibliometric analysis based on the number of publications and cited references. J. Assoc. Inf. Sci. Technol. (JASIST) 66(11), 2215–2222 (2015)

    Article  Google Scholar 

  2. Gao, S., Luo, H., Chen, D., Li, S., Gallinari, P., Guo, J.: Cross-domain recommendation via cluster-level latent factor model. In: Blockeel, H., Kersting, K., Nijssen, S., Železný, F. (eds.) ECML PKDD 2013. LNCS, vol. 8189, pp. 161–176. Springer, Heidelberg (2013). doi:10.1007/978-3-642-40991-2_11

    Chapter  Google Scholar 

  3. Glänzel, W., Meyer, M.: Patents cited in the scientific literature: an exploratory study of ‘reverse’ citation relations. Scientometrics 58(2), 415–428 (2003)

    Article  Google Scholar 

  4. Krestel, R., Smyth, P.: Recommending patents based on latent topics. In: Proceedings of the Conference on Recommender Systems (RecSys), pp. 395–398. ACM (2013)

    Google Scholar 

  5. Liu, Y., Niculescu-Mizil, A., Gryc, W.: Topic-link LDA: joint models of topic and author community. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 665–672. ACM (2009)

    Google Scholar 

  6. Mayr, P., Mutschke, P., Petras, V.: Reducing semantic complexity in distributed digital libraries: treatment of term vagueness and document re-ranking. Libr. Rev. 57(3), 213–224 (2008)

    Article  Google Scholar 

  7. Mei, Q., Cai, D., Zhang, D., Zhai, C.: Topic modeling with network regularization. In: Proceedings of the International Conference on World Wide Web (WWW), pp. 101–110. ACM (2008)

    Google Scholar 

  8. Momeni, F., Mayr, P.: Using co-authorship networks for author name disambiguation. In: Proceedings of the Joint Conference on Digital Libraries, pp. 261–262. ACM (2016)

    Google Scholar 

  9. Paul, M., Girju, R.: Cross-cultural analysis of blogs and forums with mixed-collection topic models. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1408–1417. ACL (2009)

    Google Scholar 

  10. Wang, B., Liu, S., Ding, K., Liu, Z., Xu, J.: Identifying technological topics and institution-topic distribution probability for patent competitive intelligence analysis: a case study in LTE technology. Scientometrics 101(1), 685–704 (2014)

    Article  Google Scholar 

  11. Wang, C., Blei, D.M.: Collaborative topic modeling for recommending scientific articles. In: Proceedings of the International Conference on Knowledge Discovery and Data Mining (SIGKDD), pp. 448–456. ACM (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Julian Risch .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Risch, J., Krestel, R. (2017). What Should I Cite? Cross-Collection Reference Recommendation of Patents and Papers. In: Kamps, J., Tsakonas, G., Manolopoulos, Y., Iliadis, L., Karydis, I. (eds) Research and Advanced Technology for Digital Libraries. TPDL 2017. Lecture Notes in Computer Science(), vol 10450. Springer, Cham. https://doi.org/10.1007/978-3-319-67008-9_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-67008-9_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-67007-2

  • Online ISBN: 978-3-319-67008-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics