Skip to main content
Log in

Identifying emerging technologies to envision a future innovation ecosystem: A machine learning approach to patent data

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

With the rapid changes to technology as well as industry value chains, it has become essential for firms to identify emerging promising technologies that can better respond to external disruptive forces and be used to launch new businesses or improve current businesses. One of the most commonly used approaches in identifying emerging promising technologies is patent analysis. Patents have long been regarded as a useful source of data on technologies; accordingly, a number of previous studies have applied patents to define rising technologies. However, most previous studies have significantly relied on patent information in assessing promising technologies, whereas promisingness is determined by various other factors that are not explained by patent information. To overcome the limitation of previous approaches, this study proposes a hybrid approach considering both expert opinions and patent information to identify emerging promising technologies. For analysis, we firstly developed a set of criteria with which to evaluate potentially valuable patents, had experts evaluate only a portion of the patents from a larger patent portfolio of interest based on the criteria, and finally used the evaluation results to identify other potentially valuable patents from the rest of patents in the portfolio. Here, an active semi-supervised learning technique was applied, in which a small amount of labeled data (patents evaluated by experts) was used with a large amount of unlabeled data (the other patents from the portfolio). An analysis model consists of two layers—patents and patent attributes—with patent attributes such as technology characteristics used to classify patents into promising and unpromising ones. The proposed approach was applied to the automobile industry sector, and its usability was verified; the analysis results indicated that semi-supervised learning combined with active learning has potential in effectively searching for emerging promising technologies or filtering non-promising technologies with less human input. With only a small set of labeled patents, a large set of patents could be labeled, which saves time and effort when experts evaluate patents. Methodologically, this is an early attempt to introduce active semi-supervised learning in the context of patent analysis. Practically, the research findings enable expert opinions to be used effectively in identifying promising technologies and envisioning a future innovation ecosystem, making a balance between data- and expert-driven decision-making.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. K-fold cross-validation is a statistical technique used to prevent model overfitting by splitting the training data set into K sub-sets during training. The training data set is first partitioned into K clusters of the same size. K iterations of training and valuation are performed, with one fold used for validation and the other K − 1 folds used for training. Finally, the training results are put together to accurately estimate the model’s performance.

References

  • Altuntas, S., Dereli, T., & Kusiak, A. (2015). Forecasting technology success based on patent data. Technological Forecasting and Social Change, 96, 202–214

    Article  Google Scholar 

  • Ando, R. K., & Zhang, T. (2005). A framework for learning predictive structures from multiple tasks and unlabeled data. Journal of Machine Learning Research, 6, 1817–1853

    MathSciNet  MATH  Google Scholar 

  • Archibugi, D., & Planta, M. (1996). Measuring technological change through patents and innovation surveys. Technovation, 16(9), 451–519

    Article  Google Scholar 

  • Arora, A., & Fosfuri, A. (2003). Licensing the market for technology. Journal of Economic Behavior AND Organization, 52(2), 277–295

    Article  Google Scholar 

  • Bekkers, R., Bongard, R., & Nuvolari, A. (2011). An empirical study on the determinants of essential patent claims in compatibility standards. Research Policy, 40(7), 1001–1015

    Article  Google Scholar 

  • Belkin, M., Niyogi, P., & Sindhwani, V. (2006). Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. Journal of machine learning research, 7(11), 2399–2434

    MathSciNet  MATH  Google Scholar 

  • Breitzman, A., & Thomas, P. (2015). The emerging clusters model: A tool for identifying emerging technologies across multiple patent systems. Research Policy, 44(1), 195–205

    Article  Google Scholar 

  • Bröring, S., Martin Cloutier, L., & Leker, J. (2006). The front end of innovation in an era of industry convergence: Evidence from nutraceuticals and functional foods. R&D Management, 36(5), 487–498

    Article  Google Scholar 

  • Caviggioli, F. (2016). Technology fusion: Identification and analysis of the drivers of technology convergence using patent data. Technovation, 55, 22–32

    Article  Google Scholar 

  • Chen, T., & Guestrin, C. (2016). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pp. 785–794.

  • Chen, X., & Deng, N. (2015). A semi-supervised machine learning method for Chinese patent effect annotation. In 2015 international conference on cyber-enabled distributed computing and knowledge discovery, pp. 243–250.

  • Choi, C., & Park, Y. (2009). Monitoring the organic structure of technology based on the patent development paths. Technological Forecasting and Social Change, 76(6), 754–768

    Article  Google Scholar 

  • Choi, S., & Jun, S. (2014). Vacant technology forecasting using new Bayesian patent clustering. Technology Analysis and Strategic Management, 26(3), 241–251

    Article  Google Scholar 

  • Cohn, D. A., Ghahramani, Z., & Jordan, M. I. (1996). Active learning with statistical models. Journal of Artificial Intelligence Research, 4, 129–145

    Article  MATH  Google Scholar 

  • Cohn, J. F., Zlochower, A., Lien, J., & Kanade, T. (1999). Automated face analysis by feature point tracking has high concurrent validity with manual FACS coding. Psychophysiology, 36, 35–43.

    Article  Google Scholar 

  • Crawford, M. M., Tuia, D., & Yang, H. L. (2013). Active learning: Any value for classification of remotely sensed data? Proceedings of the IEEE, 101(3), 593–608

    Article  Google Scholar 

  • Daim, T. U., Rueda, G., Martin, H., & Gerdsri, P. (2006). Forecasting emerging technologies: Use of bibliometrics and patent analysis. Technological Forecasting and Social Change, 73(8), 981–1012

    Article  Google Scholar 

  • Ernst, H. (2003). Patent information for strategic technology management. World Patent Information, 25(3), 233–242

    Article  Google Scholar 

  • Fischer, J. E., Bachmann, L. M., & Jaeschke, R. (2003). A readers’ guide to the interpretation of diagnostic test properties: Clinical example of sepsis. Intensive Care Medicine, 29(7), 1043–1051

    Article  Google Scholar 

  • Fischer, T., & Leidinger, J. (2014). Testing patent value indicators on directly observed patent value—An empirical analysis of Ocean Tomo patent auctions. Research Policy, 43(3), 519–529

    Article  Google Scholar 

  • Fleming, L. (2001). Recombinant uncertainty in technological search. Management Science, 47(1), 117–132

    Article  Google Scholar 

  • Fleming, L., Mingo, S., & Chen, D. (2007). Collaborative brokerage, generative creativity, and creative success. Administrative Science Quarterly, 52(3), 443–475

    Article  Google Scholar 

  • Geum, Y., Kim, M. S., & Lee, S. (2016). How industrial convergence happens: A taxonomical approach based on empirical evidences. Technological Forecasting and Social Change, 107, 112–120

    Article  Google Scholar 

  • Giuri, P., Munari, F., & Pasquini, M. (2013). What determines university patent commercialization? Empirical evidence on the role of IPR ownership. Industry and Innovation, 20(5), 488–502

    Article  Google Scholar 

  • Guellec, D., & de la Potterie, B. V. P. (2000). Applications, grants and the value of patent. Economics letters, 69(1), 109–114

    Article  MATH  Google Scholar 

  • Hall, B. H., Jaffe, A., & Trajtenberg, M. (2005). Market value and patent citations. The RAND Journal of Economics, 36(1), 16–38

    Google Scholar 

  • Harhoff, D., Scherer, F. M., & Vopel, K. (2003). Citations, family size, opposition and the value of patent rights. Research policy, 32(8), 1343–1363

    Article  Google Scholar 

  • Hirschey, M., & Richardson, V. J. (2001). Valuation effects of patent quality: A comparison for Japanese and US firms. Pacific-Basin Finance Journal, 9(1), 65–82

    Article  Google Scholar 

  • HLT-NAACL, 152–159. Training and assessing classification rules with imbalanced data

  • Hsieh, C. H. (2013). Patent value assessment and commercialization strategy. Technological Forecasting and Social Change, 80(2), 307–319

    Article  Google Scholar 

  • Jun, S., Sung Park, S., & Sik Jang, D. (2012). Technology forecasting using matrix map and patent clustering. Industrial Management & Data Systems, 112(5), 786–807

    Article  Google Scholar 

  • Kang, B., & Bekkers, R. (2015). Just-in-time patents and the development of standards. Research Policy, 44(10), 1948–1961

    Article  Google Scholar 

  • Kaplan, S., & Vakili, K. (2015). The double-edged sword of recombination in breakthrough innovation. Strategic Management Journal, 36(10), 1435–1457

    Article  Google Scholar 

  • Kim, C., Lee, H., Seol, H., & Lee, C. (2011). Identifying core technologies based on technological cross-impacts: An association rule mining (ARM) and analytic network process (ANP) approach. Expert Systems with Applications, 38(10), 12559–12564

    Article  Google Scholar 

  • Kim, G., & Bae, J. (2017). A novel approach to forecast promising technology through patent analysis. Technological Forecasting and Social Change, 117, 228–237

    Article  Google Scholar 

  • Kim, H., Hong, S., Kwon, O., & Lee, C. (2017). Concentric diversification based on technological capabilities: Link analysis of products and technologies. Technological Forecasting and Social Change, 118, 246–257

    Article  Google Scholar 

  • Kim, J., & Lee, S. (2015). Patent databases for innovation studies: A comparative analysis of USPTO, EPO, JPO and KIPO. Technological Forecasting and Social Change, 92, 332–345

    Article  Google Scholar 

  • Kim, J. S., Lee, Y. Y., & Kim, T. H. (2016). A review on alkaline pretreatment technology for bioconversion of lignocellulosic biomass. Bioresource technology, 199, 42-48.

    Article  Google Scholar 

  • Kyebambe, M. N., Cheng, G., Huang, Y., He, C., & Zhang, Z. (2017). Forecasting emerging technologies: A supervised learning approach through patent analysis. Technological Forecasting and Social Change, 125, 236–244

    Article  Google Scholar 

  • Lai, C., Hwang, S., & Wei, C. (2018). On the patent claim eligibility prediction using text mining techniques. Proceedings of the 51st Hawaii International Conference on System Sciences, 587–596

  • Lee, C., Cho, Y., Seol, H., & Park, Y. (2012). A stochastic patent citation analysis approach to assessing future technological impacts. Technological Forecasting and Social Change, 79(1), 16–29

    Article  Google Scholar 

  • Lee, C., Kim, J., Kwon, O., & Woo, H. G. (2016). Stochastic technology life cycle analysis using multiple patent indicators. Technological Forecasting and Social Change, 106, 53–64

    Article  Google Scholar 

  • Lee, C., Kwon, O., Kim, M., & Kwon, D. (2018). Early identification of emerging technologies: A machine learning approach using multiple patent indicators. Technological Forecasting and Social Change, 127, 291–303

    Article  Google Scholar 

  • Lee, D. S., Park, J. M., & Vanrolleghem, P. A. (2005). Adaptive multiscale principal component analysis for on-line monitoring of a sequencing batch reactor. Journal of Biotechnology, 116(2), 195–210.

    Article  Google Scholar 

  • Lee, J., Kim, J., Lee, S., Seo, D., Jung, H., & Sung, W. K. (2011). Towards discovering emerging technologies based on decision tree. In 2011 International Conference on Internet of Things and 4th International Conference on Cyber, Physical and Social Computing, 529–532

  • Lee, S., Lee, S., Seol, H., & Park, Y. (2008). Using patent information for designing new product and technology: keyword based technology roadmapping. R&d Management, 38(2), 169–188

    Article  Google Scholar 

  • Lee, Y., & Colarelli O’Connor, G. (2003). The impact of communication strategy on launching new products: The moderating role of product innovativeness. Journal of Product Innovation Management, 20(1), 4–21

    Article  Google Scholar 

  • Leng, Y., Xu, X., & Qi, G. (2013). Combining active learning and semi-supervised learning to construct SVM classifier. Knowledge-Based Systems, 44, 121–131

    Article  Google Scholar 

  • Lerner, J. (1994). The importance of patent scope: an empirical analysis. The RAND Journal of Economics, 319–333.

  • Li, M., & Zhou, Z. H. (2005). SETRED: Self-training with editing. In Pacific-Asia Conference on Knowledge Discovery and Data Mining, 611–621.

  • Liaw, A., & Wiener, M. (2002). Classification and regression by randomForest. R news, 2(3), 18–22

    Google Scholar 

  • Liu, G., Nguyen, T. T., Zhao, G., Zha, W., Yang, J., Cao, J., Wu, M., Zhao, P., & Chen, W. (2016). Repeat buyer prediction for e-commerce. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 155–164.

  • Livotov, P. (2015). Using patent information for identification of new product features with high market potential. Procedia engineering, 131, 1157–1164

    Article  Google Scholar 

  • Loyola-Gonzalez, O. (2019). Black-box vs. white-box: Understanding their advantages and weaknesses from a practical point of view. IEEE Access, 7, 154096–154113

    Article  Google Scholar 

  • Lundberg, S., & Lee, S. I. (2017). A unified approach to interpreting model predictions. arXiv preprint arXiv:1705.07874.

  • Maulik, U., & Chakraborty, D. (2011). A self-trained ensemble with semisupervised SVM: An application to pixel classification of remote sensing imagery. Pattern Recognition, 44(3), 615–623

    Article  MATH  Google Scholar 

  • McClosky, D., Charniak, E., & Johnson, M. (2006). Effective Self-Training for Parsing. In Proceedings of

  • Menardi, G., & Torelli, N. (2014). Training and assessing classification rules with imbalanced data. Data Mining and Knowledge Discovery, 28(1), 92–122

    Article  MathSciNet  MATH  Google Scholar 

  • Mitchell, V. W. (1992). Using Delphi to forecast in new technology industries. Marketing Intelligence & Planning, 10(2), 4–9

    Article  Google Scholar 

  • Momeni, A., & Rost, K. (2016). Identification and monitoring of possible disruptive technologies by patent-development paths and topic modeling. Technological Forecasting and Social Change, 104, 16–29

    Article  Google Scholar 

  • Nelson, R. R. (1961). Uncertainty, learning, and the economics of parallel research and development efforts. The Review of Economics and Statistics, 351–364.

  • Noh, H., & Lee, S. (2020). Forecasting Forward Patent Citations: Comparison of Citation-Lag Distribution, Tobit Regression, and Deep Learning Approaches. IEEE Transactions on Engineering Management.

  • Oommen, T., Baise, L. G., & Vogel, R. M. (2011). Sampling bias and class imbalance in maximum-likelihood logistic regression. Mathematical Geosciences, 43(1), 99–120

    Article  MATH  Google Scholar 

  • Park, I., Park, G., Yoon, B., & Koh, S. (2016). Exploring promising technology in ICT sector using patent network and promising index based on patent information. ETRI Journal, 38(2), 405–415.

    Article  Google Scholar 

  • Park, I., & Yoon, B. (2018). Technological opportunity discovery for technological convergence based on the prediction of technology knowledge flow in a citation network. Journal of Informetrics, 12(4), 1199–1222

    Article  Google Scholar 

  • Pilkington, A., Lee, L. L., Chan, C. K., & Ramakrishna, S. (2009). Defining key inventors: A comparison of fuel cell and nanotechnology industries. Technological Forecasting and Social Change, 76(1), 118–127

    Article  Google Scholar 

  • Putnam, J. (1997). The value of international patent rights. Yale University, Ph.D. Thesis, pp. 2589–2589.

  • Reitzig, M. (2004). Improving patent valuations for management purposes—validating new indicators by analyzing application rationales. Research policy, 33(6–7), 939–957

    Article  Google Scholar 

  • Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). " Why should i trust you?" Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp. 1135–1144.

  • Riccardi, G., & Hakkani-Tur, D. (2005). Active learning: Theory and applications to automatic speech recognition. IEEE Transactions on Speech and Audio Processing, 13(4), 504–511

    Article  Google Scholar 

  • Rosenberg, C., Hebert, M., & Schneiderman, H. (2005). Semi-Supervised Self-Training of Object Detection Models. WACV/MOTION, 2.

  • Scotchmer, S. (1991). Standing on the shoulders of giants: Cumulative research and the patent law. Journal of Economic Perspectives, 5(1), 29–41

    Article  Google Scholar 

  • Song, K., Kim, K., & Lee, S. (2018). Identifying promising technologies using patents: A retrospective feature analysis and a prospective needs analysis on outlier patents. Technological Forecasting and Social Change, 128, 118–132.

    Article  Google Scholar 

  • Squicciarini, M., Dernis, H., & Criscuolo, C. (2013). Measuring patent quality: Indicators of technological and economic value.

  • Su, H. N., Lee, P. C., Chen, C. M. L., & Chiu, C. H. (2012). Assessing the values of global patents. In 2012 Proceedings of PICMET'12: technology management for emerging technologies, pp. 966–974.

  • Suykens, J. A., & Vandewalle, J. (1999). Least squares support vector machine classifiers. Neural Processing Letters, 9(3), 293–300

    Article  Google Scholar 

  • Tanha, J., van Someren, M., & Afsarmanesh, H. (2011). Disagreement-based co-training. In 2011 IEEE 23rd international conference on tools with artificial intelligence, pp. 803–810.

  • Tanha, J., van Someren, M., & Afsarmanesh, H. (2017). Semi-supervised self-training for decision tree classifiers. International Journal of Machine Learning and Cybernetics, 8(1), 355–370

    Article  Google Scholar 

  • Tong, X., & Frame, J. D. (1994). Measuring national technological performance with patent claims data. Research Policy, 23(2), 133–141

    Article  Google Scholar 

  • Trajtenberg, M. (1990). A penny for your quotes: patent citations and the value of innovations. The Rand Journal of Economics, pp. 172–187.

  • Triguero, I., García, S., & Herrera, F. (2015). Self-labeled techniques for semi-supervised learning: taxonomy, software and empirical study. Knowledge and Information systems, 42(2), 245–284

    Article  Google Scholar 

  • Triguero, I., Sáez, J. A., Luengo, J., García, S., & Herrera, F. (2014). On the characterization of noise filters for self-training semi-supervised in nearest neighbor classification. Neurocomputing, 132, 30–41

    Article  Google Scholar 

  • Tuia, D., Pasolli, E., & Emery, W. J. (2011). Using active learning to adapt remote sensing image classifiers. Remote Sensing of Environment, 115(9), 2232–2242

    Article  Google Scholar 

  • Tuia, D., Ratle, F., Pacifici, F., Kanevski, M. F., & Emery, W. J. (2009). Active learning methods for remote sensing image classification. IEEE Transactions on Geoscience and Remote Sensing, 47(7), 2218–2232

    Article  Google Scholar 

  • Verhoeven, D., Bakker, J., & Veugelers, R. (2016). Measuring technological novelty with patent-based indicators. Research Policy, 45(3), 707–723

    Article  Google Scholar 

  • Veryzer, R. W. (2005). The roles of marketing and industrial design in discontinuous new product development. Journal of Product Innovation Management, 22(1), 22–41

    Article  Google Scholar 

  • Yarowsky, D. (1995). Unsupervised word sense disambiguation rivaling supervised methods. In 33rd annual meeting of the association for computational linguistics, pp. 189–196.

  • Yoon, B., & Magee, C. L. (2018). Exploring technology opportunities by visualizing patent information based on generative topographic mapping and link prediction. Technological Forecasting and Social Change, 132, 105–117

    Article  Google Scholar 

  • Yoon, B., & Park, Y. (2004). A text-mining-based patent network: Analytical tool for high-technology trend. The Journal of High Technology Management Research, 15(1), 37–50

    Article  MathSciNet  Google Scholar 

  • Yoon, B., Yoon, C., & Park, Y. (2002). On the development and application of a self–organizing feature map–based patent map. R&D Management, 32(4), 291–300

    Article  Google Scholar 

  • Zhang, L. (2011). Identifying key technologies in Saskatchewan, Canada: Evidence from patent information. World Patent Information, 33(4), 364–370

    Article  Google Scholar 

  • Zhu, X., Lafferty, J., & Rosenfeld, R. (2005). Semi-supervised learning with graphs (Doctoral dissertation, Carnegie Mellon University, language technologies institute, school of computer science).

Download references

Acknowledgements

We appreciate Prof. Daniel Stefan Hain and Prof. Roman Jurowetzki for their suggestions of applying active learning to this study. We also appreciate Prof. ZHOU Yuan, Joseph for his constructive comments on the previous version of this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sungjoo Lee.

Additional information

This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science and ICT (NRF-2019R1F1A1063032).

Appendices

Appendix 1: Basic information about dataset

See Tables 8, 9, 10, 11, 12.

Table 8 Descriptive statistics for variables
Table 9 Means and standard deviations of labeled data: SVM
Table 10 Means and standard deviations of labeled data: random forest
Table 11 Means and standard deviations of labeled data: XGBoost
Table 12 Correlation coefficient matrix

Appendix 2: Data preprocessing results

See Table 13.

Table 13 Data pre-processing result for skewness and near-zero variance

Appendix 3: Results based on percentage of labeled data

See Table 14.

Table 14 Total performance evaluation

Appendix 4: Performance results in different classes (NPP)

See Table

Table 15 Classification results by classifier

15.

Appendix 5: ROC plots and performance results for each iteration

See Fig. 

Fig. 14
figure 14

ROC plots according by classifier

14, Table 16.

Table 16 Total performance over time

Appendix 6: Performance comparison with previous studies

See Table 17.

Table 17 Performance comparison with previous literatures

Appendix 7: Results of degree centrality ranking of EVB technology

See Table 18.

Table 18 Results of degree centrality ranking of EVB technology

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Choi, Y., Park, S. & Lee, S. Identifying emerging technologies to envision a future innovation ecosystem: A machine learning approach to patent data. Scientometrics 126, 5431–5476 (2021). https://doi.org/10.1007/s11192-021-04001-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-021-04001-1

Keywords

Navigation