What Should I Cite? Cross-Collection Reference Recommendation of Patents and Papers

Risch, Julian; Krestel, Ralf

doi:10.1007/978-3-319-67008-9_4

What Should I Cite? Cross-Collection Reference Recommendation of Patents and Papers

Julian Risch¹⁸ &
Ralf Krestel¹⁸

Conference paper
First Online: 02 September 2017

2477 Accesses
3 Citations
2 Altmetric

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10450))

Abstract

Research results manifest in large corpora of patents and scientific papers. However, both corpora lack a consistent taxonomy and references across different document types are sparse. Therefore, and because of contrastive, domain-specific language, recommending similar papers for a given patent (or vice versa) is challenging.

We propose a recommender system that leverages topic distributions and keywords to recommend related work despite these challenges. As a case study, we evaluate our approach on patents and papers of two fields: medical and computer science. We find that topic-based recommenders complement word-based recommenders for documents with collection-specific language and increase mean average precision by up to 27%. As a result of our work, publications from both corpora form a joint digital library, which connects academia and industry.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
http://www.wipo.int/edocs/pubdocs/en/wipo_pub_941_2016.pdf.
2.
https://scholar.google.com.
3.
Larger keyword vectors increase runtime but do not improve result quality.
4.
https://www.elastic.co/products/elasticsearch.
5.
parameters set as suggested in the original paper: \(\beta =0.01\), \(\delta =0.01\), \(\gamma _1=1\), \(\gamma _2=1\).
6.
https://bulkdata.uspto.gov/ and https://aminer.org/citation.
7.
https://exporter.nih.gov/.

References

Bornmann, L., Mutz, R.: Growth rates of modern science: a bibliometric analysis based on the number of publications and cited references. J. Assoc. Inf. Sci. Technol. (JASIST) 66(11), 2215–2222 (2015)
Article Google Scholar
Gao, S., Luo, H., Chen, D., Li, S., Gallinari, P., Guo, J.: Cross-domain recommendation via cluster-level latent factor model. In: Blockeel, H., Kersting, K., Nijssen, S., Železný, F. (eds.) ECML PKDD 2013. LNCS, vol. 8189, pp. 161–176. Springer, Heidelberg (2013). doi:10.1007/978-3-642-40991-2_11
Chapter Google Scholar
Glänzel, W., Meyer, M.: Patents cited in the scientific literature: an exploratory study of ‘reverse’ citation relations. Scientometrics 58(2), 415–428 (2003)
Article Google Scholar
Krestel, R., Smyth, P.: Recommending patents based on latent topics. In: Proceedings of the Conference on Recommender Systems (RecSys), pp. 395–398. ACM (2013)
Google Scholar
Liu, Y., Niculescu-Mizil, A., Gryc, W.: Topic-link LDA: joint models of topic and author community. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 665–672. ACM (2009)
Google Scholar
Mayr, P., Mutschke, P., Petras, V.: Reducing semantic complexity in distributed digital libraries: treatment of term vagueness and document re-ranking. Libr. Rev. 57(3), 213–224 (2008)
Article Google Scholar
Mei, Q., Cai, D., Zhang, D., Zhai, C.: Topic modeling with network regularization. In: Proceedings of the International Conference on World Wide Web (WWW), pp. 101–110. ACM (2008)
Google Scholar
Momeni, F., Mayr, P.: Using co-authorship networks for author name disambiguation. In: Proceedings of the Joint Conference on Digital Libraries, pp. 261–262. ACM (2016)
Google Scholar
Paul, M., Girju, R.: Cross-cultural analysis of blogs and forums with mixed-collection topic models. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1408–1417. ACL (2009)
Google Scholar
Wang, B., Liu, S., Ding, K., Liu, Z., Xu, J.: Identifying technological topics and institution-topic distribution probability for patent competitive intelligence analysis: a case study in LTE technology. Scientometrics 101(1), 685–704 (2014)
Article Google Scholar
Wang, C., Blei, D.M.: Collaborative topic modeling for recommending scientific articles. In: Proceedings of the International Conference on Knowledge Discovery and Data Mining (SIGKDD), pp. 448–456. ACM (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Hasso-Plattner-Institut, Prof.-Dr.-Helmert-Str. 2–3, 14482, Potsdam, Germany
Julian Risch & Ralf Krestel

Authors

Julian Risch
View author publications
You can also search for this author in PubMed Google Scholar
Ralf Krestel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Julian Risch .

Editor information

Editors and Affiliations

Faculteit der Geesteswetenschappen, Universiteit van Amsterdam , Amsterdam, The Netherlands
Jaap Kamps
Library & Information Center, University of Patras , Patras, Greece
Giannis Tsakonas
Aristotle University of Thessaloniki , Thessaloniki, Greece
Yannis Manolopoulos
Civil Engineering, University of Thrace , Kimmeria, Greece
Lazaros Iliadis
Informatics, Ionian University , Kerkyra, Greece
Ioannis Karydis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Risch, J., Krestel, R. (2017). What Should I Cite? Cross-Collection Reference Recommendation of Patents and Papers. In: Kamps, J., Tsakonas, G., Manolopoulos, Y., Iliadis, L., Karydis, I. (eds) Research and Advanced Technology for Digital Libraries. TPDL 2017. Lecture Notes in Computer Science(), vol 10450. Springer, Cham. https://doi.org/10.1007/978-3-319-67008-9_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-67008-9_4
Published: 02 September 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67007-2
Online ISBN: 978-3-319-67008-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics