Skip to main content
Log in

PatSearch: an integrated framework for patentability retrieval

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Patent retrieval primarily focuses on searching relevant legal documents with respect to a given query. Depending on the purposes of specific retrieval tasks, processes of patent retrieval may differ significantly. Given a patent application, it is challenging to determine its patentability, i.e., to decide whether a similar invention has been published. Therefore, it is more important to retrieve all possible relevant documents rather than only a small subset of patents from the top ranked results. However, patents are often lengthy and rich in technical terms. It is thus often requiring enormous human efforts to compare a given patent application with retrieved results. To this end, we propose an integrated framework, PatSearch, which automatically transforms the patent application into a reasonable and effective search query. The proposed framework first extracts representative yet distinguishable terms from a given application to generate an initial search query and then expands the query by combining content proximity with topic relevance. Further, a list of relevant patent documents will be retrieved based on the generated queries to provide enough information to assist patent analysts in making the patentability decision. Finally, a comparative summary is generated to assist patent analysts in quickly reviewing retrieved results related to the patent application. Extensive quantitative analysis and case studies on real-world patent documents demonstrate the effectiveness of our proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. https://www.uspto.gov/web/offices/pac/mpep/.

  2. https://www.uspto.gov/patent/laws-regulations-policies-procedures-guidance-and-training/.

  3. http://mccormickml.com/2016/04/19/word2vec-tutorial-the-skip-gram-model/.

  4. In the experiment, we empirically set \(\delta \) as 0.5.

  5. In the experiment, we set \(\lambda \) to 0.3 as suggested in [14].

  6. In the experiment, we empirically set \(\tau \) as 0.1.

  7. http://www.uspto.gov/.

  8. http://lucene.apache.org/.

  9. http://deeplearning4j.org/.

  10. http://mallet.cs.umass.edu/.

References

  1. Alberts D, Yang CB, Fobare-DePonio D, Koubek K, Robins S, Rodgers M, Simmons E, DeMarco D (2017) Introduction to patent searching. In: Lupu M, Mayer K, Kando N, Trippe A (eds) Current challenges in patent information retrieval. The information retrieval series, vol 37. Springer, Berlin, Heidelberg

  2. Atsushi H, Yukawa T (2004) Patent map generation using concept-based vector space model. Working notes of NTCIR-4, Tokyo, pp 2–4

  3. Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022

    MATH  Google Scholar 

  4. Bouadjenek MR, Sanner S, Ferraro G (2015) A study of query reformulation for patent prior art search with partial patent applications. In: Proceedings of the 15th international conference on artificial intelligence and law. ACM, pp 23–32

  5. Charikar M, Chekuri C, Goel A, Guha S (1998) Rounding via trees: deterministic approximation algorithms for group Steiner trees and \(k\)-median. In: Proceedings of the 30th annual ACM symposium on theory of computing. ACM, pp 114–123

  6. Chvatal V (1979) A greedy heuristic for the set-covering problem. Math Oper Res 4(3):233–235

    Article  MathSciNet  MATH  Google Scholar 

  7. Fujii A (2007) Enhancing patent retrieval by citation analysis. In: Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 793–794

  8. Golestan Far M, Sanner S, Bouadjenek MR, Ferraro G, Hawking D (2015) On term selection techniques for patent prior art search. In: Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval. ACM, pp 803–806

  9. Hiemstra D, Robertson S, Zaragoza H (2004) Parsimonious language models for information retrieval. In: Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 178–185

  10. Huang X, Wan X, Xiao J (2011) Comparative news summarization using linear programming. ACL-HLT, ACL, pp 648–653

  11. Joho H, Azzopardi LA, Vanderbauwhede W (2010) A survey of patent users: an analysis of tasks, behavior, search functionality and system requirements. In: Proceedings of the 3rd symposium on information interaction in context. ACM, pp 13–24

  12. Karp RM (1972) Reducibility among combinatorial problems. Springer, Berlin

    Book  MATH  Google Scholar 

  13. Kishida K (2003) Experiment on pseudo relevance feedback method using Taylor formula at NTCIR-3 patent retrieval task. In: Proceedings of the 3rd NTCIR workshop on research in information retrieval, automatic text summarization and question answering. NII, Tokyo. http://research.nii.ac.jp/ntcir

  14. Krestel R, Smyth P (2013) Recommending patents based on latent topics. In: Proceedings of the 7th ACM conference on recommender systems. ACM, pp 395–398

  15. Lin C-Y, Hovy E (2003) Automatic evaluation of summaries using \(n\)-gram co-occurrence statistics. NAACL-HLT, ACL, pp 71–78

  16. Lupu M, Hanbury A et al (2013) Patent retrieval. Found Trends Inf Retr 7(1):1–97

    Article  Google Scholar 

  17. Lupu M, Mayer K, Tait J, Trippe AJ (2011) Current challenges in patent information retrieval. Springer Science & Business Media, Berlin

    Book  Google Scholar 

  18. Magdy W (2012) Toward higher effectiveness for recall-oriented information retrieval: a patent retrieval case study. PhD thesis, Dublin City University

  19. Magdy W, Jones G (2011) A study on query expansion methods for patent retrieval. In: Proceedings of the 4th workshop on patent information retrieval. ACM, pp 19–24

  20. Magdy W, Leveling J, Jones GJF (2009) Exploring structured documents and query formulation techniques for patent retrieval. In: Peters C, Di Nunzio GM, Kurimo M, Mandl T, Mostefa D (eds) Proceedings of the 10th cross-language evaluation forum conference on Multilingual information access evaluation: text retrieval experiments (CLEF’09). Springer-Verlag, Berlin, Heidelberg, pp 410–417

  21. Mahdabi P, Andersson L, Keikha M, Crestani F (2012) Automatic refinement of patent queries using concept importance predictors. In: Proceedings of the 35th international ACM SIGIR conference on research and development in information retrieval. ACM, pp 505–514

  22. Mahdabi P, Crestani F (2014) Patent query formulation by synthesizing multiple sources of relevance evidence. ACM Trans Inf Syst (TOIS) 32(4):16

    Article  Google Scholar 

  23. Mahdabi P, Gerani S, Huang JX, Crestani F (2013) Leveraging conceptual lexicon: query disambiguation using proximity information for patent retrieval. In: Proceedings of the 36th international ACM SIGIR conference on research and development in information retrieval. ACM, pp 113–122

  24. Rauber A, de Vries AP (eds) Multidisciplinary information retrieval. IRFC 2011. Lecture Notes in Computer Science, vol 6653. Springer, Berlin, Heidelberg

  25. Manning C, Raghavan P, Schütze H (2008) Introduction to information retrieval, vol 1. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  26. Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ (eds) Proceedings of the 26th International Conference on Neural Information Processing Systems (NIPS’13), vol 2. Curran Associates Inc., USA, pp 3111–3119

  27. Salton G (1971) The SMART retrieval system—experiments in automatic document processing. Prentice-Hall, Inc., Upper Saddle River, NJ, USA

  28. Shen C, Li T (2010) Multi-document summarization via the minimum dominating set. In: Proceedings of the 23rd International Conference on Computational Linguistics (COLING’10). Association for Computational Linguistics, Stroudsburg, PA, USA, pp 984–992

  29. Sheremetyeva S (2003) Natural language analysis of patent claims. In: Proceedings of the ACL-2003 workshop on patent corpus processing-volume 20. Association for Computational Linguistics, pp 66–73

  30. Shinmori A, Okumura M, Marukawa Y, Iwayama M (2003) Patent claim processing for readability: structure analysis and term explanation. In: Proceedings of the ACL-2003 workshop on patent corpus processing-volume 20. Association for Computational Linguistics, pp 56–65

  31. Takeuchi H, Uramoto N, Takeda K (2005) Experiments on patent retrieval at NTCIR-5 workshop. NTCIR-5

  32. Trappey AJ, Trappey CV, Wu C-Y (2009) Automatic patent document summarization for collaborative knowledge systems and services. J Syst Sci Syst Eng 18(1):71–94

    Article  Google Scholar 

  33. Tseng Y, Lin C, Lin Y (2007) Text mining techniques for patent analysis. Inf Process Manag 43(5):1216–1247

    Article  Google Scholar 

  34. Wang D, Li T (2010) Document update summarization using incremental hierarchical clustering. In: Proceedings of the 19th ACM international conference on information and knowledge management, CIKM’10. ACM, New York, pp 279–288. https://doi.org/10.1145/1871437.1871476

  35. Wang D, Zhu S, Li T, Gong Y (2012) Comparative document summarization via discriminative sentence selection. ACM Trans Knowl Discov Data 6(3):12:1–12:18. https://doi.org/10.1145/2362383.2362386

    Google Scholar 

  36. Wang D, Zhu S, Li T, Gong Y (2012) Comparative document summarization via discriminative sentence selection. ACM Trans Knowl Discov Data 6(3):12

    Google Scholar 

  37. Wei X, Croft WB (2006) LDA-based document models for ad-hoc retrieval. In: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 178–185

  38. Xu J, Croft WB (1996) Query expansion using local and global document analysis. In: Proceedings of the 19th annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 4–11

  39. Xue X, Croft W (2009) Transforming patents into prior-art queries. In: Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval. ACM, pp 808–809

  40. Yang Y, Pedersen JO (1997) A comparative study on feature selection in text categorization. ICML 97:412–420

    Google Scholar 

  41. Zhang L, Li L, Li T (2015) Patent mining: a survey. ACM SIGKDD Explor Newsl 16(2):1–19

    Article  Google Scholar 

Download references

Acknowledgements

We thank anonymous reviewers for their very useful comments and suggestions. The work was supported by National Science Foundation of China under Grant 91646116, Ministry of Education/China Mobile Joint Research Fund under Project 5-10, Jiangsu Provincial Natural Science Foundation of China under Grant BK20171447, Jiangsu Provincial University Natural Science Research of China under Grant 17KJB520024, Nanjing University of Posts and Telecommunications under Grant NY214135 and NY215045.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tao Li.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, L., Liu, Z., Li, L. et al. PatSearch: an integrated framework for patentability retrieval. Knowl Inf Syst 57, 135–158 (2018). https://doi.org/10.1007/s10115-017-1127-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-017-1127-0

Keywords

Navigation