A deep learning based method for extracting semantic information from patent documents

Chen, Liang; Xu, Shuo; Zhu, Lijun; Zhang, Jing; Lei, Xiaoping; Yang, Guancan

doi:10.1007/s11192-020-03634-y

A deep learning based method for extracting semantic information from patent documents

Published: 24 July 2020

Volume 125, pages 289–312, (2020)
Cite this article

Scientometrics Aims and scope Submit manuscript

Liang Chen¹,
Shuo Xu ORCID: orcid.org/0000-0002-8602-1819²,
Lijun Zhu¹,
Jing Zhang¹,
Xiaoping Lei¹ &
…
Guancan Yang³

3283 Accesses
45 Citations
3 Altmetric
Explore all metrics

Abstract

The text-based patent analysis is grounded in information extraction technique. However, such technique suffers from obvious defects such as low degree of automation and unsatisfactory extraction accuracy. To deal with these problems, after an information schema is pre-defined, which contains 17 types of entities and 15 types of semantic relations, a dataset of 1010 patent abstracts is annotated and opened freely to the research community. Then, a novel patent information extraction framework is proposed, in which two deep-learning models, BiLSTM-CRF and BiGRU-HAN, are respectively used for entity identification and semantic relation extraction. Finally, to demonstrate the advantages of the new framework, extensive experiments are conducted, and the SAO method and PCNNs model are taken as respective baselines on the framework and module levels. Experimental results show that our framework out-performs the traditional one in terms of automation and accuracy, and is capable of extracting fine-grained structured information from patent texts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 6

Fig. 10

Patent Specialization for Deep Learning Information Retrieval Algorithms

Knowledge Powered Cooperative Semantic Fusion for Patent Classification

Research on Patent Information Extraction Based on Deep Learning

Notes

References

Akhondi, S. A., Klenner, A. G., Tyrchan, C., Manchala, A. K., Boppana, K., Lowe, D., et al. (2014). Annotated chemical patent corpus: A gold standard for text mining. PLoS ONE, 9(9), 1–8.
Article Google Scholar
An, J., Kim, K., Mortara, L., & Lee, S. (2018). Deriving technology intelligence from patents: Preposition-based semantic analysis. Journal of Informetrics, 12(1), 217–236.
Article Google Scholar
Baldridge, J. (2005). The OpenNLP project. http://opennlp.apache.org/index.html. Accessed 14 Dec 2019.
Bergmann, I., Butzke, D., Walter, L., Fuerste, J. P., & Erdmann, V. A. (2008). Evaluating the risk of patent infringement by means of semantic patent analysis: The case of DNA chips. R& D Management, 38(5).
Carvalho, D. S., França, F. M. G., & Lima, P. M. V. (2014). Extracting semantic information from patent claims using phrasal structure annotations. In 2014 Brazilian Conference on Intelligent Systems (pp. 31–36).
Chen, D. (2018). Neural reading comprehension and beyond (Doctoral dissertation). Palo Alto, CA: Stanford University.
Google Scholar
Choi, S., Kang, D., Lim, J., & Kim, K. (2012a). A fact-oriented ontological approach to SAO-based function modeling of patents for implementing function-based technology database. Expert System with Application, 39(10), 9129–9140.
Article Google Scholar
Choi, S., Kim, H., Yoon, J., Kim, K., & Lee, J. Y. (2013). An sao-based text-mining approach for technology roadmapping using patent information. R&D management, 43(1), 52–74.
Article Google Scholar
Choi, S., Lee, H., Park, E. L., & Choi, S. (2019). Deep patent landscaping model using transformer and graph embedding. arXiv preprint arXiv: 1903.05823v4
Choi, S., Park, H., Kang, D., Lee, J. Y., & Kim, K. (2012b). An SAO-based text mining approach to building a technology tree for technology planning. Expert Systems with Applications, 39(13), 11443–11455.
Article Google Scholar
Dewulf, S. (2011). Directed variation of properties for new or improved function product DNA- a base for connect and develop. Procedia Engineering, 9, 646–652.
Article Google Scholar
Ford, E., Carroll, J. A., Smith, H. E., Scott, D., & Cassell, J. A. (2016). Extracting information fro-m the text of electronic medical records to improve case detection: a systematic review. Journal of the American Medical Informatics Association, 23(5), 1007–1015.
Article Google Scholar
Guo, J., Wang, X., Li, Q., & Zhu, D. (2016). Subject- action- object- based morphology analysis for determining the direction of technological change. Technological Forecasting and Social Change, 105, 27–40.
Article Google Scholar
Han, X., Gao, T., Yao, Y., Ye, D., Liu, Z., Sun, M. (2019). OpenNRE: An open and extensible toolkit for neural relation extraction. arXiv preprint arXiv: 1301.3781
Han, C., Lim, H., Lee, D., Cho, H., & Kang, K. (2017). Patent analysis for forecasting promising technology in high-rise building construction. Technological Forecasting and Social Change, 128(3), 144–153.
Google Scholar
Huang, Z., Xu, W., &Yu K. (2015). Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint arXiv:1508.01991.
Invention Machine Corporation. (2001). Knowledgist 2.5-Product Description http://www.triz.ch/KN25Prodesc.doc. Accessed 14 Dec 2019.
Jurafsky, D., Martin, J. (2019). Speech and language processing (the 3nd edition draft). https://web.stanford.edu/~jurafsky/slp3/. Accessed 24 Dec 2019.
Lee, C., & Lee, G. (2019). Technology opportunity analysis based on recombinant search patent landscape analysis for idea generation. Scientometrics, 121(2), 603–632.
Article Google Scholar
Li, S., Hu, J., Cui, Y., & Hu, J. (2018). DeepPatent: patent classification with convolutional neural networks and word embedding. Scientometrics, 117(2), 721–744.
Article Google Scholar
Lupu, M. (2017). Information retrieval, machine learning, and NLP for intellectual property information. World Patent Information, 49, A1–A3.
Article Google Scholar
Manning, C., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S., & McClosky, D. (2014). The Stanford CoreNLP natural language processing toolkit. In Proceedings of 52nd annual meeting of the association for computational linguistics: System demonstrations (pp. 55–60).
Mikolov, T., Chen, K., Corrado G., & Dean, J.(2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv: 1301.3781.
Miller, G. A. (1995). WordNet: A lexical database for English. Communications of the ACM, 38(11), 39–41.
Article Google Scholar
Moehrle, M. G., Walter, L., Geritz, A., & Müller, S. (2005). Patent- based inventor profiles as a basis for human resource decisions in research and development. R&D Management, 35(5), 513–524.
Article Google Scholar
Park, H., Yoon, J., & Kim, K. (2012). Identifying patent infringement using SAO based semantic technological similarities. Scientometrics, 90(2), 515–529.
Article Google Scholar
Park, H., Yoon, J., & Kim, K. (2013). Using function-based patent analysis to identify potential application areas of technology for technology transfer. Expert Systems with Applications, 40(13), 5260–5265.
Article Google Scholar
Pennington, J., Socher, R., & Manning, C. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (pp. 1532–1543).
Pérez-Pérez, M., Pérez-Rodríguez, G., Vazquez, M., Fdez-Riverola, F., Oyarzabal, J., Oyarzabal, J., Valencia, A., Lourenço, A., & Krallinger, M. (2017). Evaluation of chemical and gene/protein entity recognition systems at BioCreative V.5: The CEMP and GPRO patents tracks. In Proceedings of the BioCreative V.5 challenge evaluation workshop, pp. 11–18.
Phan, M. C., & Sun, A. (2018). CoNEREL: Collective information extraction in news articles. In The 41st international ACM SIGIR conference on research & development in information retrieval (pp. 1273–1276).
Rajshekhar, K., Shalaby, W., & Zadrozny, W. (2016). Analytics in post-grant patent review: possibilities and challenges (preliminary report). In Proceedings of the American Society for Engineering Management 2016 international annual conference.
Risch, J., & Krestel, R. (2019). Domain-specific word embeddings for patent classification. Data Technologies and Applications, 53(1), 108–122.
Article Google Scholar
Sang, E. F., & De Meulder, F. (2003). Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition. arXiv preprint arXiv:cs/0306050.
Singh, S. (2018). Natural language processing for information extraction. arXiv preprint arXiv: 1807.02383.
Souili, A., Cavallucci, D., & Rousselot, F. (2015). Natural Language Processing (NLP): A solution for knowledge extraction from patent unstructured data. Procedia Engineering, 131, 635–643.
Article Google Scholar
Stenetorp, P., Pyysalo, S., Topić, G., Ohta, T., Ananiadou, S., & Tsujii, J. I. (2012). BRAT: A web-based tool for NLP-assisted text annotation. In Proceedings of the demonstrations at the 13th conference of the european chapter of the association for computational linguistics (pp. 102–107).
Strzalkowski, T. (Ed.). (1999). Natural language information retrieval. Dordrecht: Kluwer.
MATH Google Scholar
Tsourikov, V., Batchilo, L., & Sovpel, I. (2000). Document semantic analysis/selection with knowledge creativity capability utilizing subject-action-object (SAO) structures (No. 6167370). Alexandria, VA: U. S. Patent and Trademark Office.
Wang, X., Qiu, P., Zhu, D., Mitkova, L., Lei, M., & Porter, A. (2015). Identification of technology development trends based on subject- action- object analysis: The case of dye-sensitized solar cells. Technological Forecasting and Social Change, 98, 24–46.
Article Google Scholar
Wang, X., Ren, H., Chen, Y., Liu, Y., Qiao, Y., & Huang, Y. (2019). Measuring patent similarity with SAO semantic analysis. Scientometrics, 121(1), 1–23.
Article Google Scholar
Wu, H. (2019). Report of 2019 language & intelligence technique evaluation. Baidu Corporation. http://tcci.ccf.org.cn/summit/2019/dlinfo/1101-wh.pdf, Accessed 24 Dec 2019.
Xu, S., An, X., Zhu, L., Zhang, Y., & Zhang, H. (2015). A CRF-based system for recognizing chemical entity mentions (CEMs) in biomedical literature. Journal of Cheminformatics, 7(Suppl 1), S11.
Article Google Scholar
Xu, J., Guo, L., Jiang, J., Ge, B., & Li, M. (2019). A deep learning methodology for automatic extraction and discovery of technical intelligence. Technological Forecasting and Social Change, 146(9), 339–351.
Article Google Scholar
Xu, S., Zhu, L., Qiao, X., Xue, C. (2009). A novel approach for measuring Chinese terms semantic similarity based on pairwise sequence alignment. In Proceedings of the 5th international conference on semantics, knowledge and grid, pp. 92–98.
Yang, C. B. (2012). Role of patent analysis in corporate R&D. Pharmaceutical Patent Analyst, 1(1), 5–7.
Article Google Scholar
Yang, C., Huang, C., & Su, J. (2018). An improved SAO network-based method for technology trend analysis: A case study of graphene. Journal of Informetrics, 12(1), 271–286.
Article Google Scholar
Yang, S., & Soo, V. (2012). Extract conceptual graphs from plain texts in patent claims. Engineering Applications of Artificial Intelligence, 25(4), 874–887.
Article Google Scholar
Yang, C., Zhu, D., Bergmann, X., Zhang, Y., & Lu, J. (2017). Requirement-oriented core technological components’ identification based on SAO analysis. Scientometrics, 112(2), 1229–1248.
Article Google Scholar
Yoon, J., & Kim, K. (2012). An analysis of property–function based patent networks for strategic R&D planning in fast-moving industries: The case of silicon-based thin film solar cells. Expert Systems with Applications, 39(9), 7709–7717.
Article Google Scholar
Yoon, J., Ko, N., Kim, J., Lee, J. M., Coh, B. Y., & Song, I. (2015). A function-based knowledge base for technology intelligence. Industrial Engineering & Management Systems, 14(1), 73–87.
Article Google Scholar
Zeng, D., Liu, K., Chen, Y., & Zhao, J. (2015). Distant supervision for relation extraction via piecewise convolutional neural networks. In Proceedings of the 2015 conference on empirical methods in natural language processing (pp. 1753–1762).
Zhang, L. (2016). An integrated framework for patent analysis and mining (Doctoral dissertation). Miami, FL: Florida International University.
Google Scholar
Zheng, S., Wang, F., Bao, H., Hao, Y., Zhou, P., & Xu, B. (2017). Joint extraction of entities and relations based on a novel tagging scheme. arXiv preprint arXiv:1706.05075.
Zhou, Y., Dong, F., Liu, Y., Li, Z., Du, J., & Zhang, L. (2020). Forecasting emerging technologies using data augmentation and deep learning. Scientometrics, 122(1), 1–29.
Article Google Scholar

Download references

Acknowledgements

This research received the financial support from National Natural Science Foundation of China under Grant Number 71704169, and Social Science Foundation of Beijing Municipality under Grant Number 17GLB074, respectively. Our gratitude also goes to the anonymous reviewers for their valuable suggestions and comments.

Author information

Authors and Affiliations

Institute of Scientific and Technical Information of China, Beijing, 100038, People’s Republic of China
Liang Chen, Lijun Zhu, Jing Zhang & Xiaoping Lei
Research Base of Beijing Modern Manufacturing Development, College of Economics and Management, Beijing University of Technology, Beijing, 100124, People’s Republic of China
Shuo Xu
School of Information Resource Management, Renmin University of China, Beijing, 100872, People’s Republic of China
Guancan Yang

Authors

Liang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Shuo Xu
View author publications
You can also search for this author in PubMed Google Scholar
Lijun Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Jing Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoping Lei
View author publications
You can also search for this author in PubMed Google Scholar
Guancan Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shuo Xu.

Appendix

There are two types of errors for entity identification: (1) errors in entity boundary detection, (2) errors in entity type classification. General confusion matrix is capable of recording the first type of errors. As for the second type, an extra column is appended to the confusion matrix in Table 8, where rows indicate true entity types and columns predicted ones, and the last column (ebd) denotes the errors in boundary detection.

Table 8 The confusion matrix of entity identification

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, L., Xu, S., Zhu, L. et al. A deep learning based method for extracting semantic information from patent documents. Scientometrics 125, 289–312 (2020). https://doi.org/10.1007/s11192-020-03634-y

Download citation

Received: 02 January 2020
Published: 24 July 2020
Issue Date: October 2020
DOI: https://doi.org/10.1007/s11192-020-03634-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A deep learning based method for extracting semantic information from patent documents

Abstract

Access this article

Similar content being viewed by others

Patent Specialization for Deep Learning Information Retrieval Algorithms

Knowledge Powered Cooperative Semantic Fusion for Patent Classification

Research on Patent Information Extraction Based on Deep Learning

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A deep learning based method for extracting semantic information from patent documents

Abstract

Access this article

Similar content being viewed by others

Patent Specialization for Deep Learning Information Retrieval Algorithms

Knowledge Powered Cooperative Semantic Fusion for Patent Classification

Research on Patent Information Extraction Based on Deep Learning

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation