Skip to main content
Log in

Patent expanded retrieval via word embedding under composite-domain perspectives

  • Research Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

Patent prior art search uses dispersed information to retrieve all the relevant documents with strong ambiguity from the massive patent database. This challenging task consists in patent reduction and patent expansion. Existing studies on patent reduction ignore the relevance between technical characteristics and technical domains, and result in ambiguous queries. Works on patent expansion expand terms from external resource by selecting words with similar distribution or similar semantics. However, this splits the relevance between the distribution and semantics of the terms. Besides, common repository hardly meets the requirement of patent expansion for uncommon semantics and unusual terms. In order to solve these problems, we first present a novel composite-domain perspective model which converts the technical characteristic of a query patent to a specific composite classified domain and generates aspect queries. We then implement patent expansion with double consistency by combining distribution and semantics simultaneously.We also propose to train semantic vector spaces via word embedding under the specific classified domains, so as to provide domain-aware expanded resource. Finally, multiple retrieval results of the same topic are merged based on perspective weight and rank in the results. Our experimental results on CLEP-IP 2010 demonstrate that our method is very effective. It reaches about 5.43% improvement in recall and nearly 12.38% improvement in PRES over the state-of-the-art. Our work also achieves the best performance balance in terms of recall, MAP and PRES.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Zhang L, Li L, Li T. Patent mining: a survey. ACM SIGKDD Explorations Newsletter, 2015, 16(2): 1–19

    Article  Google Scholar 

  2. Xue X, Croft B. Automatic query generation for patent search. In: Proceedings of the 18th ACM International Conference on Information and Knowledge Management. 2009, 2037–2040

    Google Scholar 

  3. Xue X, Croft B. Transforming patents into prior-art queries. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2009, 808–809

    Google Scholar 

  4. Kim Y, Seo J, Croft B. Automatic boolean query suggestion for professional search. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2011, 825–834

    Google Scholar 

  5. Kim Y, Croft B. Diversifying query suggestions based on query documents. In: Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2014, 891–894

    Google Scholar 

  6. Far G, Sanner S, Bouadjenek M R, Ferraro G, Hawking D. On term selection techniques for patent prior art search. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2015, 803–806

    Google Scholar 

  7. Al-Shboul B, Myaeng H. Query phrase expansion using wikipedia in patent class search. In: Proceedings of the 7th Asia Information Retrieval Symposium. 2011, 115–126

    Google Scholar 

  8. Magdy W, Jones J F. A study on query expansion methods for patent retrieval. In: Proceedings of the 4th Workshop on Patent Information Retrieval. 2011, 19–24

    Chapter  Google Scholar 

  9. Kishida K. Pseudo relevance feedback method based on taylor expansion of retrieval function in NTCIR-3 patent retrieval task. In: Proceedings of the ACL-2003 Workshop on Patent Corpus Processing. 2003, 33–40

    Chapter  Google Scholar 

  10. Mahdabi P, Andersson L, Keikha M, Crestani F. Automatic refinement of patent queries using concept importance predictors. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2012, 505–514

    Google Scholar 

  11. Mahdabi P, Gerani S, Huang X, Crestani F. Leveraging conceptual lexicon: query disambiguation using proximity information for patent retrieval. In: Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2013, 113–122

    Google Scholar 

  12. Wang F, Lin L. Domain lexicon-based query expansion for patent retrieval. In: Proceedings of the 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery. 2016, 1543–1547

    Google Scholar 

  13. Mahdabi P, Crestani F. Query-driven mining of citation networks for patent citation retrieval and recommendation. In: Proceedings of the 23rd ACM International Conference on Information and Knowledge Management. 2014, 1659–1668

    Google Scholar 

  14. Judea A, Schütze H, Brügmann S. Unsupervised training set generation for automatic acquisition of technical terminology in patents. In: Proceedings of the 15th International Conference on Computational Linguistics. 2014, 290–300

    Google Scholar 

  15. Magdy W, Leveling J, Jones G J F. Exploring structured documents and query formulation techniques for patent retrieval. In: Proceedings of the Workshop on Cross-Language Evaluation Forum for European Languages. 2009, 410–417

    Google Scholar 

  16. Mahdabi P, Crestani F. Patent query formulation by synthesizing multiple sources of relevance evidence. ACM Transactions on Information Systems, 2014, 32(4): 1–30

    Article  Google Scholar 

  17. Cetintas S, Si L. Effective query generation and postprocessing strategies for prior art patent search. Journal of the Association for Information Science and Technology, 2012, 63(3): 512–527

    Google Scholar 

  18. Ganguly D, Leveling J, Magdy W, Jones G J F. Patent query reduction using pseudo relevance feedback. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management. 2011, 1953–1956

    Google Scholar 

  19. Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. 2013, arXiv preprint arXiv:1301.3781

    Google Scholar 

  20. Magdy W, Jones G J F. PRES: a score metric for evaluating recall-oriented information retrieval applications. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2010, 611–618

    Google Scholar 

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Grant Nos. 61232002, 61572376), the Science and Technology Support Program of Hubei Province (2015BAA127) and the Wuhan Innovation Team Project (2014070504020237).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Tieyun Qian or Zhiyong Peng.

Additional information

Fei Wang is a PhD candidate at the School of Computer Science, Wuhan University, China. His current research interests are in the database, complex data management, patent mining, information retrieval and natural language processing. He received the ME degree in Computer Science from Chengdu University of Information Technology, China in 2014.

Tieyun Qian is a professor at the State Key Laboratory of Software Engineering at Wuhan University, China. She received her BS degree in computer science from Wuhan University of Technology, China in 1991, and her PhD degree in computer science from Huazhong University of Science and Technology, China in 2006. Her current research interests include text mining, Web mining, and natural language processing. She has published over 30 papers in leading conferences including ACL, EMNLP, SIGIR, etc. She is a member of CCF and ACM. She has served as program committee member of many premium conferences: WWW, COLING, DASFAA, WAIM, and APWeb.

Bin Liu is a lecture at the School of Computer Science, Wuhan University, China. Bin Liu received the PhD, BS, and ME degree in Computer Science from Wuhan University, China. His current research interests are in the database, data mining, complex data management and natural language processing.

Zhiyong Peng received the BS and ME degree in Computer Science from Wuhan University and Changsha Institute of Technology of China, respectively. He received PhD degree from Kyoto University of Japan in 1995. He is a professor at Wuhan University. Prior to join Wuhan University in 2000, he worked as a researcher at the Advanced Software Technology and Mechatronics Research Institute of Kyoto from 1995 to 1997 and was a member of the technical staff at Hewlett-Packard Laboratories, Japan from 1997 to 2000. His current research interests are in the database, trusted data management, and complex data management.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, F., Qian, T., Liu, B. et al. Patent expanded retrieval via word embedding under composite-domain perspectives. Front. Comput. Sci. 13, 1048–1061 (2019). https://doi.org/10.1007/s11704-018-7056-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11704-018-7056-6

Keywords

Navigation