Abstract
With the rapid changes to technology as well as industry value chains, it has become essential for firms to identify emerging promising technologies that can better respond to external disruptive forces and be used to launch new businesses or improve current businesses. One of the most commonly used approaches in identifying emerging promising technologies is patent analysis. Patents have long been regarded as a useful source of data on technologies; accordingly, a number of previous studies have applied patents to define rising technologies. However, most previous studies have significantly relied on patent information in assessing promising technologies, whereas promisingness is determined by various other factors that are not explained by patent information. To overcome the limitation of previous approaches, this study proposes a hybrid approach considering both expert opinions and patent information to identify emerging promising technologies. For analysis, we firstly developed a set of criteria with which to evaluate potentially valuable patents, had experts evaluate only a portion of the patents from a larger patent portfolio of interest based on the criteria, and finally used the evaluation results to identify other potentially valuable patents from the rest of patents in the portfolio. Here, an active semi-supervised learning technique was applied, in which a small amount of labeled data (patents evaluated by experts) was used with a large amount of unlabeled data (the other patents from the portfolio). An analysis model consists of two layers—patents and patent attributes—with patent attributes such as technology characteristics used to classify patents into promising and unpromising ones. The proposed approach was applied to the automobile industry sector, and its usability was verified; the analysis results indicated that semi-supervised learning combined with active learning has potential in effectively searching for emerging promising technologies or filtering non-promising technologies with less human input. With only a small set of labeled patents, a large set of patents could be labeled, which saves time and effort when experts evaluate patents. Methodologically, this is an early attempt to introduce active semi-supervised learning in the context of patent analysis. Practically, the research findings enable expert opinions to be used effectively in identifying promising technologies and envisioning a future innovation ecosystem, making a balance between data- and expert-driven decision-making.
Similar content being viewed by others
Notes
K-fold cross-validation is a statistical technique used to prevent model overfitting by splitting the training data set into K sub-sets during training. The training data set is first partitioned into K clusters of the same size. K iterations of training and valuation are performed, with one fold used for validation and the other K − 1 folds used for training. Finally, the training results are put together to accurately estimate the model’s performance.
References
Altuntas, S., Dereli, T., & Kusiak, A. (2015). Forecasting technology success based on patent data. Technological Forecasting and Social Change, 96, 202–214
Ando, R. K., & Zhang, T. (2005). A framework for learning predictive structures from multiple tasks and unlabeled data. Journal of Machine Learning Research, 6, 1817–1853
Archibugi, D., & Planta, M. (1996). Measuring technological change through patents and innovation surveys. Technovation, 16(9), 451–519
Arora, A., & Fosfuri, A. (2003). Licensing the market for technology. Journal of Economic Behavior AND Organization, 52(2), 277–295
Bekkers, R., Bongard, R., & Nuvolari, A. (2011). An empirical study on the determinants of essential patent claims in compatibility standards. Research Policy, 40(7), 1001–1015
Belkin, M., Niyogi, P., & Sindhwani, V. (2006). Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. Journal of machine learning research, 7(11), 2399–2434
Breitzman, A., & Thomas, P. (2015). The emerging clusters model: A tool for identifying emerging technologies across multiple patent systems. Research Policy, 44(1), 195–205
Bröring, S., Martin Cloutier, L., & Leker, J. (2006). The front end of innovation in an era of industry convergence: Evidence from nutraceuticals and functional foods. R&D Management, 36(5), 487–498
Caviggioli, F. (2016). Technology fusion: Identification and analysis of the drivers of technology convergence using patent data. Technovation, 55, 22–32
Chen, T., & Guestrin, C. (2016). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pp. 785–794.
Chen, X., & Deng, N. (2015). A semi-supervised machine learning method for Chinese patent effect annotation. In 2015 international conference on cyber-enabled distributed computing and knowledge discovery, pp. 243–250.
Choi, C., & Park, Y. (2009). Monitoring the organic structure of technology based on the patent development paths. Technological Forecasting and Social Change, 76(6), 754–768
Choi, S., & Jun, S. (2014). Vacant technology forecasting using new Bayesian patent clustering. Technology Analysis and Strategic Management, 26(3), 241–251
Cohn, D. A., Ghahramani, Z., & Jordan, M. I. (1996). Active learning with statistical models. Journal of Artificial Intelligence Research, 4, 129–145
Cohn, J. F., Zlochower, A., Lien, J., & Kanade, T. (1999). Automated face analysis by feature point tracking has high concurrent validity with manual FACS coding. Psychophysiology, 36, 35–43.
Crawford, M. M., Tuia, D., & Yang, H. L. (2013). Active learning: Any value for classification of remotely sensed data? Proceedings of the IEEE, 101(3), 593–608
Daim, T. U., Rueda, G., Martin, H., & Gerdsri, P. (2006). Forecasting emerging technologies: Use of bibliometrics and patent analysis. Technological Forecasting and Social Change, 73(8), 981–1012
Ernst, H. (2003). Patent information for strategic technology management. World Patent Information, 25(3), 233–242
Fischer, J. E., Bachmann, L. M., & Jaeschke, R. (2003). A readers’ guide to the interpretation of diagnostic test properties: Clinical example of sepsis. Intensive Care Medicine, 29(7), 1043–1051
Fischer, T., & Leidinger, J. (2014). Testing patent value indicators on directly observed patent value—An empirical analysis of Ocean Tomo patent auctions. Research Policy, 43(3), 519–529
Fleming, L. (2001). Recombinant uncertainty in technological search. Management Science, 47(1), 117–132
Fleming, L., Mingo, S., & Chen, D. (2007). Collaborative brokerage, generative creativity, and creative success. Administrative Science Quarterly, 52(3), 443–475
Geum, Y., Kim, M. S., & Lee, S. (2016). How industrial convergence happens: A taxonomical approach based on empirical evidences. Technological Forecasting and Social Change, 107, 112–120
Giuri, P., Munari, F., & Pasquini, M. (2013). What determines university patent commercialization? Empirical evidence on the role of IPR ownership. Industry and Innovation, 20(5), 488–502
Guellec, D., & de la Potterie, B. V. P. (2000). Applications, grants and the value of patent. Economics letters, 69(1), 109–114
Hall, B. H., Jaffe, A., & Trajtenberg, M. (2005). Market value and patent citations. The RAND Journal of Economics, 36(1), 16–38
Harhoff, D., Scherer, F. M., & Vopel, K. (2003). Citations, family size, opposition and the value of patent rights. Research policy, 32(8), 1343–1363
Hirschey, M., & Richardson, V. J. (2001). Valuation effects of patent quality: A comparison for Japanese and US firms. Pacific-Basin Finance Journal, 9(1), 65–82
HLT-NAACL, 152–159. Training and assessing classification rules with imbalanced data
Hsieh, C. H. (2013). Patent value assessment and commercialization strategy. Technological Forecasting and Social Change, 80(2), 307–319
Jun, S., Sung Park, S., & Sik Jang, D. (2012). Technology forecasting using matrix map and patent clustering. Industrial Management & Data Systems, 112(5), 786–807
Kang, B., & Bekkers, R. (2015). Just-in-time patents and the development of standards. Research Policy, 44(10), 1948–1961
Kaplan, S., & Vakili, K. (2015). The double-edged sword of recombination in breakthrough innovation. Strategic Management Journal, 36(10), 1435–1457
Kim, C., Lee, H., Seol, H., & Lee, C. (2011). Identifying core technologies based on technological cross-impacts: An association rule mining (ARM) and analytic network process (ANP) approach. Expert Systems with Applications, 38(10), 12559–12564
Kim, G., & Bae, J. (2017). A novel approach to forecast promising technology through patent analysis. Technological Forecasting and Social Change, 117, 228–237
Kim, H., Hong, S., Kwon, O., & Lee, C. (2017). Concentric diversification based on technological capabilities: Link analysis of products and technologies. Technological Forecasting and Social Change, 118, 246–257
Kim, J., & Lee, S. (2015). Patent databases for innovation studies: A comparative analysis of USPTO, EPO, JPO and KIPO. Technological Forecasting and Social Change, 92, 332–345
Kim, J. S., Lee, Y. Y., & Kim, T. H. (2016). A review on alkaline pretreatment technology for bioconversion of lignocellulosic biomass. Bioresource technology, 199, 42-48.
Kyebambe, M. N., Cheng, G., Huang, Y., He, C., & Zhang, Z. (2017). Forecasting emerging technologies: A supervised learning approach through patent analysis. Technological Forecasting and Social Change, 125, 236–244
Lai, C., Hwang, S., & Wei, C. (2018). On the patent claim eligibility prediction using text mining techniques. Proceedings of the 51st Hawaii International Conference on System Sciences, 587–596
Lee, C., Cho, Y., Seol, H., & Park, Y. (2012). A stochastic patent citation analysis approach to assessing future technological impacts. Technological Forecasting and Social Change, 79(1), 16–29
Lee, C., Kim, J., Kwon, O., & Woo, H. G. (2016). Stochastic technology life cycle analysis using multiple patent indicators. Technological Forecasting and Social Change, 106, 53–64
Lee, C., Kwon, O., Kim, M., & Kwon, D. (2018). Early identification of emerging technologies: A machine learning approach using multiple patent indicators. Technological Forecasting and Social Change, 127, 291–303
Lee, D. S., Park, J. M., & Vanrolleghem, P. A. (2005). Adaptive multiscale principal component analysis for on-line monitoring of a sequencing batch reactor. Journal of Biotechnology, 116(2), 195–210.
Lee, J., Kim, J., Lee, S., Seo, D., Jung, H., & Sung, W. K. (2011). Towards discovering emerging technologies based on decision tree. In 2011 International Conference on Internet of Things and 4th International Conference on Cyber, Physical and Social Computing, 529–532
Lee, S., Lee, S., Seol, H., & Park, Y. (2008). Using patent information for designing new product and technology: keyword based technology roadmapping. R&d Management, 38(2), 169–188
Lee, Y., & Colarelli O’Connor, G. (2003). The impact of communication strategy on launching new products: The moderating role of product innovativeness. Journal of Product Innovation Management, 20(1), 4–21
Leng, Y., Xu, X., & Qi, G. (2013). Combining active learning and semi-supervised learning to construct SVM classifier. Knowledge-Based Systems, 44, 121–131
Lerner, J. (1994). The importance of patent scope: an empirical analysis. The RAND Journal of Economics, 319–333.
Li, M., & Zhou, Z. H. (2005). SETRED: Self-training with editing. In Pacific-Asia Conference on Knowledge Discovery and Data Mining, 611–621.
Liaw, A., & Wiener, M. (2002). Classification and regression by randomForest. R news, 2(3), 18–22
Liu, G., Nguyen, T. T., Zhao, G., Zha, W., Yang, J., Cao, J., Wu, M., Zhao, P., & Chen, W. (2016). Repeat buyer prediction for e-commerce. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 155–164.
Livotov, P. (2015). Using patent information for identification of new product features with high market potential. Procedia engineering, 131, 1157–1164
Loyola-Gonzalez, O. (2019). Black-box vs. white-box: Understanding their advantages and weaknesses from a practical point of view. IEEE Access, 7, 154096–154113
Lundberg, S., & Lee, S. I. (2017). A unified approach to interpreting model predictions. arXiv preprint arXiv:1705.07874.
Maulik, U., & Chakraborty, D. (2011). A self-trained ensemble with semisupervised SVM: An application to pixel classification of remote sensing imagery. Pattern Recognition, 44(3), 615–623
McClosky, D., Charniak, E., & Johnson, M. (2006). Effective Self-Training for Parsing. In Proceedings of
Menardi, G., & Torelli, N. (2014). Training and assessing classification rules with imbalanced data. Data Mining and Knowledge Discovery, 28(1), 92–122
Mitchell, V. W. (1992). Using Delphi to forecast in new technology industries. Marketing Intelligence & Planning, 10(2), 4–9
Momeni, A., & Rost, K. (2016). Identification and monitoring of possible disruptive technologies by patent-development paths and topic modeling. Technological Forecasting and Social Change, 104, 16–29
Nelson, R. R. (1961). Uncertainty, learning, and the economics of parallel research and development efforts. The Review of Economics and Statistics, 351–364.
Noh, H., & Lee, S. (2020). Forecasting Forward Patent Citations: Comparison of Citation-Lag Distribution, Tobit Regression, and Deep Learning Approaches. IEEE Transactions on Engineering Management.
Oommen, T., Baise, L. G., & Vogel, R. M. (2011). Sampling bias and class imbalance in maximum-likelihood logistic regression. Mathematical Geosciences, 43(1), 99–120
Park, I., Park, G., Yoon, B., & Koh, S. (2016). Exploring promising technology in ICT sector using patent network and promising index based on patent information. ETRI Journal, 38(2), 405–415.
Park, I., & Yoon, B. (2018). Technological opportunity discovery for technological convergence based on the prediction of technology knowledge flow in a citation network. Journal of Informetrics, 12(4), 1199–1222
Pilkington, A., Lee, L. L., Chan, C. K., & Ramakrishna, S. (2009). Defining key inventors: A comparison of fuel cell and nanotechnology industries. Technological Forecasting and Social Change, 76(1), 118–127
Putnam, J. (1997). The value of international patent rights. Yale University, Ph.D. Thesis, pp. 2589–2589.
Reitzig, M. (2004). Improving patent valuations for management purposes—validating new indicators by analyzing application rationales. Research policy, 33(6–7), 939–957
Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). " Why should i trust you?" Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp. 1135–1144.
Riccardi, G., & Hakkani-Tur, D. (2005). Active learning: Theory and applications to automatic speech recognition. IEEE Transactions on Speech and Audio Processing, 13(4), 504–511
Rosenberg, C., Hebert, M., & Schneiderman, H. (2005). Semi-Supervised Self-Training of Object Detection Models. WACV/MOTION, 2.
Scotchmer, S. (1991). Standing on the shoulders of giants: Cumulative research and the patent law. Journal of Economic Perspectives, 5(1), 29–41
Song, K., Kim, K., & Lee, S. (2018). Identifying promising technologies using patents: A retrospective feature analysis and a prospective needs analysis on outlier patents. Technological Forecasting and Social Change, 128, 118–132.
Squicciarini, M., Dernis, H., & Criscuolo, C. (2013). Measuring patent quality: Indicators of technological and economic value.
Su, H. N., Lee, P. C., Chen, C. M. L., & Chiu, C. H. (2012). Assessing the values of global patents. In 2012 Proceedings of PICMET'12: technology management for emerging technologies, pp. 966–974.
Suykens, J. A., & Vandewalle, J. (1999). Least squares support vector machine classifiers. Neural Processing Letters, 9(3), 293–300
Tanha, J., van Someren, M., & Afsarmanesh, H. (2011). Disagreement-based co-training. In 2011 IEEE 23rd international conference on tools with artificial intelligence, pp. 803–810.
Tanha, J., van Someren, M., & Afsarmanesh, H. (2017). Semi-supervised self-training for decision tree classifiers. International Journal of Machine Learning and Cybernetics, 8(1), 355–370
Tong, X., & Frame, J. D. (1994). Measuring national technological performance with patent claims data. Research Policy, 23(2), 133–141
Trajtenberg, M. (1990). A penny for your quotes: patent citations and the value of innovations. The Rand Journal of Economics, pp. 172–187.
Triguero, I., García, S., & Herrera, F. (2015). Self-labeled techniques for semi-supervised learning: taxonomy, software and empirical study. Knowledge and Information systems, 42(2), 245–284
Triguero, I., Sáez, J. A., Luengo, J., García, S., & Herrera, F. (2014). On the characterization of noise filters for self-training semi-supervised in nearest neighbor classification. Neurocomputing, 132, 30–41
Tuia, D., Pasolli, E., & Emery, W. J. (2011). Using active learning to adapt remote sensing image classifiers. Remote Sensing of Environment, 115(9), 2232–2242
Tuia, D., Ratle, F., Pacifici, F., Kanevski, M. F., & Emery, W. J. (2009). Active learning methods for remote sensing image classification. IEEE Transactions on Geoscience and Remote Sensing, 47(7), 2218–2232
Verhoeven, D., Bakker, J., & Veugelers, R. (2016). Measuring technological novelty with patent-based indicators. Research Policy, 45(3), 707–723
Veryzer, R. W. (2005). The roles of marketing and industrial design in discontinuous new product development. Journal of Product Innovation Management, 22(1), 22–41
Yarowsky, D. (1995). Unsupervised word sense disambiguation rivaling supervised methods. In 33rd annual meeting of the association for computational linguistics, pp. 189–196.
Yoon, B., & Magee, C. L. (2018). Exploring technology opportunities by visualizing patent information based on generative topographic mapping and link prediction. Technological Forecasting and Social Change, 132, 105–117
Yoon, B., & Park, Y. (2004). A text-mining-based patent network: Analytical tool for high-technology trend. The Journal of High Technology Management Research, 15(1), 37–50
Yoon, B., Yoon, C., & Park, Y. (2002). On the development and application of a self–organizing feature map–based patent map. R&D Management, 32(4), 291–300
Zhang, L. (2011). Identifying key technologies in Saskatchewan, Canada: Evidence from patent information. World Patent Information, 33(4), 364–370
Zhu, X., Lafferty, J., & Rosenfeld, R. (2005). Semi-supervised learning with graphs (Doctoral dissertation, Carnegie Mellon University, language technologies institute, school of computer science).
Acknowledgements
We appreciate Prof. Daniel Stefan Hain and Prof. Roman Jurowetzki for their suggestions of applying active learning to this study. We also appreciate Prof. ZHOU Yuan, Joseph for his constructive comments on the previous version of this paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science and ICT (NRF-2019R1F1A1063032).
Appendices
Appendix 1: Basic information about dataset
Appendix 2: Data preprocessing results
See Table 13.
Appendix 3: Results based on percentage of labeled data
See Table 14.
Appendix 4: Performance results in different classes (NPP)
See Table
15.
Appendix 5: ROC plots and performance results for each iteration
See Fig.
Appendix 6: Performance comparison with previous studies
See Table 17.
Appendix 7: Results of degree centrality ranking of EVB technology
See Table 18.
Rights and permissions
About this article
Cite this article
Choi, Y., Park, S. & Lee, S. Identifying emerging technologies to envision a future innovation ecosystem: A machine learning approach to patent data. Scientometrics 126, 5431–5476 (2021). https://doi.org/10.1007/s11192-021-04001-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-021-04001-1