Skip to main content
Log in

Exploiting word embedding for heterogeneous topic model towards patent recommendation

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

Patent recommendation aims to recommend patent documents that have similar content to a given target patent. With the explosive growth in patent applications, how to recommend relevant patents from the massive number of patents has become an extremely challenging problem. The main obstacle in patent recommendation is how to distinguish the meanings of the same word in different contexts or associate multiple words that express the same meaning. In this paper, we propose a Heterogeneous Topic model exploiting Word embedding to enhance word semantics (HTW). First, we model the relationship among text, inventors, and applicants around the topic to build a heterogeneous topic model and learn the patent feature representation to capture contextual word semantics. Second, a word embedding is constructed to extract the deep semantics for associating multiple words that express the same meaning. Finally, with words as connections, the mapping from patent feature representations to patent embedding is established through a matrix operation, which integrates the information between the word embedding and patent feature representation. HTW considers the heterogeneity of patents and enhances the distinction or association among words simultaneously. The experimental results on real-world datasets show that HTW exceeds typical keyword-based methods, topic models, and embedding models on patent recommendations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. https://www.google.com/?tbm=pts.

  2. https://www.patentics.com/.

  3. https://www.uspto.gov/patent.

  4. https://github.com/helmersl/patent_similarity_search.

  5. http://www.patentsview.org/download

References

  • Arts, S., Hou, J., Carlos Gomez, J. (2019). Text mining to measure novelty and diffusion of technological innovation. In: 17th International conference on scientometrics & informetrics (ISSI2019), vol II, pp. 1798–1800.

  • Arts, Sam, Cassiman, Bruno, & Gomez, Juan Carlos. (2018). Text matching to measure patent similarity. Strategic Management Journal, 39(1), 62–84.

    Article  Google Scholar 

  • Bashir, S., & Rauber, A. (2010). Improving retrievability of patents in prior-art search. In: European Conference on Information Retrieval, pp. 457–470. Springer.

  • Blei, David M., Ng, Andrew Y., & Jordan, Michael I. (2003). Latent dirichlet allocation. Journal of machine Learning research, 3, 993–1022.

    MATH  Google Scholar 

  • Cao, Juan, Zhang, Yong-Dong, Li, Jintao, & Tang, Sheng. (2008). A method of adaptively selecting best lda model based on density. Chinese Journal of Computers, 31(31), 1780–1787.

    Google Scholar 

  • Chen, Lixin. (2017). Do patent citations indicate knowledge linkage? the evidence from text similarities between patents and their citations. Journal of Informetrics, 11(1), 63–79.

    Article  Google Scholar 

  • Chen, Baitong, Ding, Ying, & Ma, Feicheng. (2018). Semantic word shifts in a scientific domain. Scientometrics, 117(1), 211–226.

    Article  Google Scholar 

  • Choi, Hayoung, Seunghyun, Oh, Choi, Sungchul, & Yoon, Janghyeok. (2018). Innovation topic analysis of technology: The case of augmented reality patents. IEEE Access, 6, 16119–16137.

    Article  Google Scholar 

  • Deerwester, Scott, Dumais, Susan T., Furnas, George W., Landauer, Thomas K., & Harshman, Richard. (1990). Indexing by latent semantic analysis. Journal of the American society for information science, 41(6), 391–407.

    Article  Google Scholar 

  • Ganguly, D., Leveling, J., Magdy, W., & Jones, G. J. (2011). Patent query reduction using pseudo relevance feedback. In: Proceedings of the 20th ACM international conference on Information and knowledge management, pp. 1953–1956. ACM.

  • Golestan Far, M., Sanner, S., Bouadjenek, M. R., Ferraro, G., & Hawking, D. (2015). On term selection techniques for patent prior art search. In: Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval, pp. 803–806. ACM.

  • Griffiths, Thomas L., & Steyvers, Mark. (2004). Finding scientific topics. Proceedings of the National academy of Sciences, 101(suppl 1), 5228–5235.

    Article  Google Scholar 

  • Helmers, Lea, Horn, Franziska, Biegler, Franziska, Oppermann, Tim, & Müller, Klaus-Robert. (2019). Automating the search for a patent’s prior art with a full text similarity search. PloS ONE, 14(3), e0212103.

    Article  Google Scholar 

  • Krestel, R., & Smyth, P. (2013). Recommending patents based on latent topics. In:Proceedings of the 7th ACM conference on Recommender systems, pp. 395–398.

  • Le, Q., & Mikolov, T. (2014). Distributed representations of sentences and documents. In: International conference on machine learning, pp. 1188–1196.

  • Li, Shaobo, Jie, Hu, Cui, Yuxin, & Jianjun, Hu. (2018). Deeppatent: Patent classification with convolutional neural networks and word embedding. Scientometrics, 117(2), 721–744.

    Article  Google Scholar 

  • Li, Guancheng, Lai, Ronald, D’Amour, Alexander, Doolin, David M., Sun, Ye, Torvik, Vetle I., et al. (2014). Disambiguation and co-authorship networks of the us patent inventor database (1975–2010). Research Policy, 43(6), 941–955.

    Article  Google Scholar 

  • Liu, Y., Liu, Z., Chua, T. S., & Sun, M. (2015). Topical word embeddings. In: Twenty-ninth AAAI conference on artificial intelligence.

  • Lupu, M., Piroi, F., & Stefanov, V. (2017). An introduction to contemporary search technology. In: Current challenges in patent information retrieval (pp. 47–73). Springer, Berlin, Heidelberg

  • Mahdabi, P., & Crestani, F. (2012). Learning-based pseudo-relevance feedback for patent retrieval. In: Information retrieval facility conference, pp. 1–11. Springer.

  • Mahdabi, P., Keikha, M., Gerani, S., Landoni, M., & Crestani, F. (2011). Building queries for prior-art search. In: Information retrieval facility conference, pp. 3–15. Springer.

  • Mikolov, T., Sutskever, I., Chen, K., Corrado, Greg S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp. 3111–3119.

  • Oh, S., Lei, Z., Lee, W. C., Mitra, P., & Yen, J. (2013). Cv-pcr: a context-guided value-driven framework for patent citation recommendation. In: Proceedings of the 22nd ACM international conference on information & knowledge management, pp. 2291–2296. ACM.

  • Shalaby, W., & Zadrozny, W. (2018). Toward an interactive patent retrieval framework based on distributed representations. In: The 41st International ACM SIGIR conference on research & development in information retrieval, pp. 957–960. ACM.

  • Shalaby, W., & Zadrozny, W. (2019). Patent retrieval: a literature review. Knowledge and Information Systems,. https://doi.org/10.1007/s10115-018-132.

    Article  Google Scholar 

  • Singh, Jagendra, & Sharan, Aditi. (2016). Relevance feedback-based query expansion model using ranks combining and word2vec approach. IETE Journal of Research, 62(5), 591–604.

    Article  Google Scholar 

  • Tang, J., Wang, B., Yang, Y., Hu, P., Zhao, Y., Yan, X., Gao, B., Huang, M., Xu, P., Li, W., et al. (2012). Patentminer: topic-driven patent analysis and mining. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 1366–1374. ACM.

  • Tannebaum, W., & Rauber, A. (2015). PatNet: a lexical database for the patent domain. In: European conference on information retrieval (pp. 550–555). Springer, Cham.

  • Verma, M., & Varma, V. (2011). Applying key phrase extraction to aid invalidity search. In: Proceedings of the 13th international conference on artificial intelligence and law, pp. 249–255. ACM.

  • Wang, F., & Lin, L. (2017). Exploiting semantic knowledge base for patent retrieval. In: 2017 13th International conference on natural computation, fuzzy systems and knowledge discovery (ICNC-FSKD), pp. 2195–2200. IEEE.

  • Wang, F., Lin, L., Yang, S., & Zhu, X. (2013). A semantic query expansion-based patent retrieval approach. In:2013 10th International conference on fuzzy systems and knowledge discovery (FSKD), pp. 572–577.

  • Xue, X., & Croft, W. B. (2009). Automatic query generation for patent search. In: Proceedings of the 18th ACM conference on Information and knowledge management, pp. 2037–2040. ACM.

  • Zhang, Longhui, Liu, Zheng, Li, Lei, Shen, Chao, & Li, Tao. (2018). Patsearch: An integrated framework for patentability retrieval. Knowledge and Information Systems, 57(1), 135–158.

    Article  Google Scholar 

Download references

Acknowledgements

This work is supported by the National Key Research and Development Program of China (2017YFB1401903), National Natural Science Foundation of China (Grants #61876001 and #61673020), the Major Program of the National Social Science Foundation of China (Grant No.18ZDA032), and the Provincial Natural Science Foundation of Anhui Province (#1708085QF156).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shu Zhao.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, J., Chen, J., Zhao, S. et al. Exploiting word embedding for heterogeneous topic model towards patent recommendation. Scientometrics 125, 2091–2108 (2020). https://doi.org/10.1007/s11192-020-03666-4

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-020-03666-4

Keywords

Navigation