Patent document clustering with deep embeddings

Kim, Jaeyoung; Yoon, Janghyeok; Park, Eunjeong; Choi, Sungchul

doi:10.1007/s11192-020-03396-7

Patent document clustering with deep embeddings

Published: 23 March 2020

Volume 123, pages 563–577, (2020)
Cite this article

Scientometrics Aims and scope Submit manuscript

Jaeyoung Kim¹,
Janghyeok Yoon²,
Eunjeong Park³ &
…
Sungchul Choi ORCID: orcid.org/0000-0002-5836-3838¹

2244 Accesses
28 Citations
Explore all metrics

Abstract

The analysis of scientific and technical documents is crucial in the process of establishing science and technology strategies. One popular method for such analysis is for field experts to manually classify each scientific or technical document into one of several predefined technical categories. However, not only is manual classification error-prone and expensive, but it also requires extended efforts to handle frequent data updates. In contrast, machine learning and text mining techniques enable cheaper and faster operations, and can alleviate the burden on human resources. In this paper, we propose a method for extracting embedded feature vectors by applying a neural embedding approach for text features in patent documents and automatically clustering the embedding features by utilizing a deep embedding clustering method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

A Review on Word Embedding Techniques for Text Classification

A hybrid DNN–LSTM model for detecting phishing URLs

Article 08 August 2021

A survey on bipartite graphs embedding

Article Open access 21 March 2023

Notes

References

Akers, L. (2003). The future of patent information—a user with a view. World Patent Information, 25(4), 303.
Article Google Scholar
Beltz, H., Fülöp, A., Wadhwa, R. R., & Érdi, P. (2017). In 2017 International joint conference on neural networks (IJCNN) (pp. 1388–1394). IEEE.
Bengio, Y., Ducharme, R., Vincent, P., & Jauvin, C. (2003). A neural probabilistic language model. Journal of Machine Learning Research, 3(Feb), 1137.
MATH Google Scholar
Bengio, Y., Lamblin, P., Popovici, D., & Larochelle, H. (2007). In: Advances in neural information processing systems (pp. 153–160).
Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2016). Enriching word vectors with subword information. arXiv preprint arXiv:1607.04606.
Choi, S., & Jun, S. (2014). Vacant technology forecasting using new bayesian patent clustering. Technology Analysis & Strategic Management, 26(3), 241. https://doi.org/10.1080/09537325.2013.850477.
Article Google Scholar
Choi, S., Park, H., Kang, D., Lee, J. Y., & Kim, K. (2012). An sao-based text mining approach to building a technology tree for technology planning. Expert Systems with Applications, 39(13), 11443.
Article Google Scholar
Delorme, J. (1982). Dissemination of patent information. World Patent Information, 4(4), 155.
Article MathSciNet Google Scholar
Du, R., Drake, B., & Park, H. (2017). Hybrid clustering based on content and connection structure using joint nonnegative matrix factorization. arXiv preprint arXiv:1703.09646
Duchi, J., Hazan, E., & Singer, Y. (2011). Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12(Jul), 2121.
MathSciNet MATH Google Scholar
Fattori, M., Pedrazzi, G., & Turra, R. (2003). Text mining applied to patent mapping: A practical business case. World Patent Information, 25(4), 335.
Article Google Scholar
Fowlkes, E. B., & Mallows, C. L. (1983). A method for comparing two hierarchical clusterings. Journal of the American Statistical Association, 78(383), 553. https://doi.org/10.1080/01621459.1983.10478008.
Article MATH Google Scholar
Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313(5786), 504.
Article MathSciNet MATH Google Scholar
Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1), 193. https://doi.org/10.1007/BF01908075.
Article MATH Google Scholar
Jun, S., Park, S. S., & Jang, D. S. (2014). Document clustering method using dimension reduction and support vector clustering to overcome sparseness. Expert Systems with Applications, 41(7), 3204.
Article Google Scholar
Kang, I. S., Na, S. H., Kim, J., & Lee, J. H. (2007). Cluster-based patent retrieval. Information Processing & Management, 43(5), 1173.
Article Google Scholar
Klambauer, G., Unterthiner, T., Mayr, A., & Hochreiter, S. (2017). In: Advances in neural information processing systems (pp. 972–981).
Krizhevsky, A., Nair, V., & Hinton, G. (2009). Cifar-10 and cifar-100 datasets. Retrieved March 1, 2016, from https://www.cs.toronto.edu/kriz/cifar. html.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). In Advances in neural information processing systems (pp. 1097–1105).
Le, Q., & Mikolov, T. (2014). In: International conference on machine learning (pp. 1188–1196).
Lee, C., Jeon, J., & Park, Y. (2011). Monitoring trends of technological changes based on the dynamic patent lattice: A modified formal concept analysis approach. Technological Forecasting and Social Change, 78(4), 690.
Article Google Scholar
Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9(Nov), 2579.
MATH Google Scholar
Madani, F., & Weber, C. (2016). The evolution of patent mining: Applying bibliometrics analysis and keyword network analysis. World Patent Information, 46, 32.
Article Google Scholar
Meireles, M. R. G., Carvalho, J. R., do Patrocínio Júnior, Z. K., & Almeida, P. E. (2017). Automatic patent clustering using som and bibliographic coupling. iSys-Revista Brasileira de Sistemas de Informação, 10(1), 06.
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). In: Advances in neural information processing systems (pp. 3111–3119).
Pang, B., & Lee, L. (2005). In Proceedings of the 43rd annual meeting on association for computational linguistics (ACL) (pp. 115–124). Association for Computational Linguistics.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., et al. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825.
MathSciNet MATH Google Scholar
Pennington, J., Socher, R., & Manning, C. (2014). In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1532–1543).
Ramos, J., et al. (2003). In Proceedings of the first instructional conference on machine learning (Vol. 242, pp. 133–142).
Rodriguez, A., Tosyali, A., Kim, B., Choi, J., Lee, J., Coh, B., et al. (2016). Patent clustering and outlier ranking methodologies for attributed patent citation networks for technology opportunity discovery. IEEE Transactions on Engineering Management, 63(4), 426. https://doi.org/10.1109/TEM.2016.2580619.
Article Google Scholar
Shibata, N., Kajikawa, Y., Takeda, Y., & Matsushima, K. (2008). Detecting emerging research fronts based on topological measures in citation networks of scientific publications. Technovation, 28(11), 758.
Article Google Scholar
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1), 1929.
MathSciNet MATH Google Scholar
Trappey, A. J., & Trappey, C. V. (2008). An R&D knowledge management method for patent document summarization. Industrial Management & Data Systems, 108(2), 245.
Article Google Scholar
Trappey, A. J., Trappey, C. V., & Wu, C. Y. (2009). Automatic patent document summarization for collaborative knowledge systems and services. Journal of Systems Science and Systems Engineering, 18(1), 71.
Article Google Scholar
Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., & Manzagol, P. A. (2010). Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research, 11(Dec), 3371.
MathSciNet MATH Google Scholar
Wallach, H. M. (2006). In Proceedings of the 23rd international conference on machine learning (pp. 977–984). ACM.
Xie, J., Girshick, R., & Farhadi, A. (2016). In International conference on machine learning (pp. 478–487).
Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., & Hovy, E. (2016). In: Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: Human language technologies (pp. 1480–1489).
Yoon, B., & Park, Y. (2004). A text-mining-based patent network: Analytical tool for high-technology trend. The Journal of High Technology Management Research, 15(1), 37.
Article Google Scholar
Yoon, J., & Kim, K. (2012). Detecting signals of new technological opportunities using semantic patent analysis and outlier detection. Scientometrics, 90(2), 445.
Article Google Scholar
Young, T., Hazarika, D., Poria, S., & Cambria, E. (2017). Recent trends in deep learning based natural language processing. arXiv preprint arXiv:1708.02709
Zeiler, M. D. (2012). Adadelta: An adaptive learning rate method. arXiv preprint arXiv:1212.5701
Zhang, C., & Zhang, S. (2002). Association rule mining: Models and algorithms. Berlin: Springer.
Book MATH Google Scholar
Zhang, W., Yoshida, T., Tang, X., & Wang, Q. (2010). Text clustering using frequent itemsets. Knowledge-Based Systems, 23(5), 379.
Article Google Scholar

Download references

Acknowledgements

This work was supported by the National Research Foundation of Korea (NRF) Grant and funded by the Korean government (No. NRF-2015R1C1A1A01056185 and 2018R1D1A1B07045825).

Author information

Authors and Affiliations

TEAMLAB, Department of Industrial Management Engineering, Gachon University, Seongnam-si, Gyeonggi-do, Republic of Korea
Jaeyoung Kim & Sungchul Choi
Department of Industrial Engineering, Konkuk University, Seoul, Republic of Korea
Janghyeok Yoon
NAVER, Bundang-gu, Seongnam-si, Gyeonggi-do, Republic of Korea
Eunjeong Park

Authors

Jaeyoung Kim
View author publications
You can also search for this author in PubMed Google Scholar
Janghyeok Yoon
View author publications
You can also search for this author in PubMed Google Scholar
Eunjeong Park
View author publications
You can also search for this author in PubMed Google Scholar
Sungchul Choi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Eunjeong Park or Sungchul Choi.

Additional information

Eunjeong Park and Sungchul Choi are co-corresponding authors.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kim, J., Yoon, J., Park, E. et al. Patent document clustering with deep embeddings. Scientometrics 123, 563–577 (2020). https://doi.org/10.1007/s11192-020-03396-7

Download citation

Received: 05 May 2018
Published: 23 March 2020
Issue Date: May 2020
DOI: https://doi.org/10.1007/s11192-020-03396-7

Keywords

Mathematics Subject Classification

68U15

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Patent document clustering with deep embeddings

Abstract

Access this article

Similar content being viewed by others

A Review on Word Embedding Techniques for Text Classification

A hybrid DNN–LSTM model for detecting phishing URLs

A survey on bipartite graphs embedding

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Patent document clustering with deep embeddings

Abstract

Access this article

Similar content being viewed by others

A Review on Word Embedding Techniques for Text Classification

A hybrid DNN–LSTM model for detecting phishing URLs

A survey on bipartite graphs embedding

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation