Patent expanded retrieval via word embedding under composite-domain perspectives

Wang, Fei; Qian, Tieyun; Liu, Bin; Peng, Zhiyong

doi:10.1007/s11704-018-7056-6

Patent expanded retrieval via word embedding under composite-domain perspectives

Research Article
Published: 17 June 2019

Volume 13, pages 1048–1061, (2019)
Cite this article

Frontiers of Computer Science Aims and scope Submit manuscript

Fei Wang^1,2,
Tieyun Qian^1,2,
Bin Liu^1,2 &
…
Zhiyong Peng^1,2

116 Accesses
8 Citations
1 Altmetric
Explore all metrics

Abstract

Patent prior art search uses dispersed information to retrieve all the relevant documents with strong ambiguity from the massive patent database. This challenging task consists in patent reduction and patent expansion. Existing studies on patent reduction ignore the relevance between technical characteristics and technical domains, and result in ambiguous queries. Works on patent expansion expand terms from external resource by selecting words with similar distribution or similar semantics. However, this splits the relevance between the distribution and semantics of the terms. Besides, common repository hardly meets the requirement of patent expansion for uncommon semantics and unusual terms. In order to solve these problems, we first present a novel composite-domain perspective model which converts the technical characteristic of a query patent to a specific composite classified domain and generates aspect queries. We then implement patent expansion with double consistency by combining distribution and semantics simultaneously.We also propose to train semantic vector spaces via word embedding under the specific classified domains, so as to provide domain-aware expanded resource. Finally, multiple retrieval results of the same topic are merged based on perspective weight and rank in the results. Our experimental results on CLEP-IP 2010 demonstrate that our method is very effective. It reaches about 5.43% improvement in recall and nearly 12.38% improvement in PRES over the state-of-the-art. Our work also achieves the best performance balance in terms of recall, MAP and PRES.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Enriching Word Embeddings for Patent Retrieval with Global Context

Patent Retrieval Based on Multiple Information Resources

Semantic Query-Based Patent Summarization System (SQPSS)

References

Zhang L, Li L, Li T. Patent mining: a survey. ACM SIGKDD Explorations Newsletter, 2015, 16(2): 1–19
Article Google Scholar
Xue X, Croft B. Automatic query generation for patent search. In: Proceedings of the 18th ACM International Conference on Information and Knowledge Management. 2009, 2037–2040
Google Scholar
Xue X, Croft B. Transforming patents into prior-art queries. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2009, 808–809
Google Scholar
Kim Y, Seo J, Croft B. Automatic boolean query suggestion for professional search. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2011, 825–834
Google Scholar
Kim Y, Croft B. Diversifying query suggestions based on query documents. In: Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2014, 891–894
Google Scholar
Far G, Sanner S, Bouadjenek M R, Ferraro G, Hawking D. On term selection techniques for patent prior art search. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2015, 803–806
Google Scholar
Al-Shboul B, Myaeng H. Query phrase expansion using wikipedia in patent class search. In: Proceedings of the 7th Asia Information Retrieval Symposium. 2011, 115–126
Google Scholar
Magdy W, Jones J F. A study on query expansion methods for patent retrieval. In: Proceedings of the 4th Workshop on Patent Information Retrieval. 2011, 19–24
Chapter Google Scholar
Kishida K. Pseudo relevance feedback method based on taylor expansion of retrieval function in NTCIR-3 patent retrieval task. In: Proceedings of the ACL-2003 Workshop on Patent Corpus Processing. 2003, 33–40
Chapter Google Scholar
Mahdabi P, Andersson L, Keikha M, Crestani F. Automatic refinement of patent queries using concept importance predictors. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2012, 505–514
Google Scholar
Mahdabi P, Gerani S, Huang X, Crestani F. Leveraging conceptual lexicon: query disambiguation using proximity information for patent retrieval. In: Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2013, 113–122
Google Scholar
Wang F, Lin L. Domain lexicon-based query expansion for patent retrieval. In: Proceedings of the 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery. 2016, 1543–1547
Google Scholar
Mahdabi P, Crestani F. Query-driven mining of citation networks for patent citation retrieval and recommendation. In: Proceedings of the 23rd ACM International Conference on Information and Knowledge Management. 2014, 1659–1668
Google Scholar
Judea A, Schütze H, Brügmann S. Unsupervised training set generation for automatic acquisition of technical terminology in patents. In: Proceedings of the 15th International Conference on Computational Linguistics. 2014, 290–300
Google Scholar
Magdy W, Leveling J, Jones G J F. Exploring structured documents and query formulation techniques for patent retrieval. In: Proceedings of the Workshop on Cross-Language Evaluation Forum for European Languages. 2009, 410–417
Google Scholar
Mahdabi P, Crestani F. Patent query formulation by synthesizing multiple sources of relevance evidence. ACM Transactions on Information Systems, 2014, 32(4): 1–30
Article Google Scholar
Cetintas S, Si L. Effective query generation and postprocessing strategies for prior art patent search. Journal of the Association for Information Science and Technology, 2012, 63(3): 512–527
Google Scholar
Ganguly D, Leveling J, Magdy W, Jones G J F. Patent query reduction using pseudo relevance feedback. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management. 2011, 1953–1956
Google Scholar
Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. 2013, arXiv preprint arXiv:1301.3781
Google Scholar
Magdy W, Jones G J F. PRES: a score metric for evaluating recall-oriented information retrieval applications. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2010, 611–618
Google Scholar

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Grant Nos. 61232002, 61572376), the Science and Technology Support Program of Hubei Province (2015BAA127) and the Wuhan Innovation Team Project (2014070504020237).

Author information

Authors and Affiliations

School of Computer Science, Wuhan University, Wuhan, 430072, China
Fei Wang, Tieyun Qian, Bin Liu & Zhiyong Peng
State Key Lab of Software Engineering, Wuhan University, Wuhan, 430072, China
Fei Wang, Tieyun Qian, Bin Liu & Zhiyong Peng

Authors

Fei Wang
View author publications
Search author on:PubMed Google Scholar
Tieyun Qian
View author publications
Search author on:PubMed Google Scholar
Bin Liu
View author publications
Search author on:PubMed Google Scholar
Zhiyong Peng
View author publications
Search author on:PubMed Google Scholar

Corresponding authors

Correspondence to Tieyun Qian or Zhiyong Peng.

Additional information

Fei Wang is a PhD candidate at the School of Computer Science, Wuhan University, China. His current research interests are in the database, complex data management, patent mining, information retrieval and natural language processing. He received the ME degree in Computer Science from Chengdu University of Information Technology, China in 2014.

Tieyun Qian is a professor at the State Key Laboratory of Software Engineering at Wuhan University, China. She received her BS degree in computer science from Wuhan University of Technology, China in 1991, and her PhD degree in computer science from Huazhong University of Science and Technology, China in 2006. Her current research interests include text mining, Web mining, and natural language processing. She has published over 30 papers in leading conferences including ACL, EMNLP, SIGIR, etc. She is a member of CCF and ACM. She has served as program committee member of many premium conferences: WWW, COLING, DASFAA, WAIM, and APWeb.

Bin Liu is a lecture at the School of Computer Science, Wuhan University, China. Bin Liu received the PhD, BS, and ME degree in Computer Science from Wuhan University, China. His current research interests are in the database, data mining, complex data management and natural language processing.

Zhiyong Peng received the BS and ME degree in Computer Science from Wuhan University and Changsha Institute of Technology of China, respectively. He received PhD degree from Kyoto University of Japan in 1995. He is a professor at Wuhan University. Prior to join Wuhan University in 2000, he worked as a researcher at the Advanced Software Technology and Mechatronics Research Institute of Kyoto from 1995 to 1997 and was a member of the technical staff at Hewlett-Packard Laboratories, Japan from 1997 to 2000. His current research interests are in the database, trusted data management, and complex data management.

Electronic supplementary material

Supplementary material, approximately 185 KB.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, F., Qian, T., Liu, B. et al. Patent expanded retrieval via word embedding under composite-domain perspectives. Front. Comput. Sci. 13, 1048–1061 (2019). https://doi.org/10.1007/s11704-018-7056-6

Download citation

Received: 16 February 2017
Accepted: 08 September 2017
Published: 17 June 2019
Issue Date: October 2019
DOI: https://doi.org/10.1007/s11704-018-7056-6

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Patent expanded retrieval via word embedding under composite-domain perspectives

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Enriching Word Embeddings for Patent Retrieval with Global Context

Patent Retrieval Based on Multiple Information Resources

Semantic Query-Based Patent Summarization System (SQPSS)

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Electronic supplementary material

Supplementary material, approximately 185 KB.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now