Abstract
Content analysis with the use of keywords experiences a take-off period in science mapping. Within the family of co-word analyses, analyzing the keyword content is commonly preceded by computer-assisted preprocessing, which may leave substantial noise and bias in the structure of the network set up. Despite these flaws, only few articles have attempted to go beyond conventional keyword standardization steps, although leveraging expert knowledge holds the promise to reduce the tradeoff between interpretability and representativeness in scaled bibliometric studies. We propose systematic manual preprocessing, an algorithmic keyword standardization and restructuring (KSR) procedure, and the paper is a validation study of the method. The innovation management (IM) disciplinary area is used to demonstrate the extent to which the quality and interpretability of bibliometric networks change and improve if in-depth keyword standardization and restructuring is implemented. For the demonstration, two networks of more than 5000 articles were set up and analyzed using identical steps, keyword preprocessing being the only difference. The impact of the KSR procedure on the clusterings is considerable and interpretation is greatly affected. Recommendations have been compiled for researchers, who would like to build keyword-based science maps to analyze content.






Similar content being viewed by others
Notes
Namely, bibliographic coupling and co-citation analyses.
For building science maps, the other three tools are co-author, bibliographic coupling and co-citation analyses. Computer-assisted textual analysis comprises a diverse set of methods, the overview of which is beyond the scope of this article. Word frequency counts, wordclouds, sentiment analysis, topic modelling, text classification, Natural Language Processing etc. are illustrative examples of textual analysis. These methods are used in practice-oriented works as well as analysis and academic research.
Organization is a so-called polysemic term.
Three search strings were combined for the management and business domains in Web of Science. 1. (innovation* AND keyword* AND “*network*” AND (“author*” OR “article*” OR “researcher*” OR “scholar*” OR “scientist*”)), 2. (innovation AND (“keyword* content analys*” OR (keyword* AND (cooccur* OR co-occur)) OR (coword OR co-word))), 3. (innovation* AND keyword* AND (cooccur* OR co-occur* OR coword OR co-word)). In relevant articles keyword cooccurence must have been analysed in a network approach.
Such as smart city, the systemic view of entrepreneurship supply chains and business models.
The search profile in Table A3 of the Appendices can be used to download the metadata from the Web of Science database.
The latter is also a valid approach. Maintaining focus on content analysis, hence the effort to remove insignificant content representations from the network. Cosine similarity ranges from 0 to 1, where 1 indicates the strongest similarity.
The principle of omitting the most frequent terms is analogous with the Term Frequency-Inverse Document Frequency (TF/IDF) approach. In our case, at the first clustering step, the global semantic context has already been set and at the second level a more granular semantic structure or context emerges.
Papers not retained by the KSR method, were assigned a separate cluster.
Papers not retained by any of the keyword similarity methods—the one used for RAW and KSR for the RESTRUCT—were assigned a separate cluster.
Most of the dropout occurs when the characteristic keywords, whose presence would have adverse impact on delineating second level communities, are omitted to enhance discriminatory power of the clustering. This ommission is common practice in setting up co-occurrence science maps. For details, please consult Tables A4a, A4b, A5a and A5b in the Appendices.
For details, please consult Figures A3a and A3b in the Appendices.
References
Aguinis, H., Pierce, C. A., Bosco, F. A., & Muslin, I. S. (2009). First decade of organizational research methods: Trends in design, measurement, and data-analysis topics. Organizational Research Methods, 12(1), 69–112. https://doi.org/10.1177/1094428108322641
Andersen, N. (2021). Mapping the expatriate literature: A bibliometric review of the field from 1998 to 2017 and identification of current research fronts. The International Journal of Human Resource Management, 32(22), 4687–4724. https://doi.org/10.1080/09585192.2019.1661267
Ball, R. (2021). Handbook Bibliometrics. De Gruyter Saur. https://doi.org/10.1515/9783110646610
De Bellis, N. (2014). History and Evolution of (Biblio)Metrics. In Beyond Bibliometrics: Harnessing Multidimensional Indicators of Scholarly Impact (pp. 0). The MIT Press. https://doi.org/10.7551/mitpress/9445.003.0004
Blondel, V. D., Guillaume, J.-L., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(10), P10008. https://doi.org/10.1088/1742-5468/2008/10/P10008
Borsi, B., & Soós, S. (2019). Mapping innovation management research on the scientific landscape–report on the research in progress. In I. Bitran, S. Conn, C. Gernreich, M. Heber, E. Huizingh, O. Kokshagina, M. Torkkeli, & M. Tynnhammar (Eds.), Proceedings of The XXX ISPIM Innovation Conference - Celebrating Innovation / 500 Years Since Da Vinci. Lappeenranta-Lahti University of Technology.
Borsi, B., Vida, Z., & Soós, S. (2022). IM and KM: Co-evolution of Disciplines. A Keyword Content Analysis (research in progress). Manuscript paper presented at the XXXIII ISPIM Innovation Conference – Innovating in a Digital World, Copenhagen, Denmark, 5–8 June 2022.
Boyack, K. W., Klavans, R., & Börner, K. (2005). Mapping the backbone of science. Scientometrics, 64(3), 351–374. https://doi.org/10.1007/s11192-005-0255-6
Callon, M., Courtial, J.-P., & Laville, F. (1991). Co-word analysis as a tool for describing the network of interactions between basic and technological research: The case of polymer chemsitry. Scientometrics, 22(1), 155–205. https://doi.org/10.1007/bf02019280
Chen, G., & Xiao, L. (2016). Selecting publication keywords for domain analysis in bibliometrics: A comparison of three methods. Journal of Informetrics, 10(1), 212–223. https://doi.org/10.1016/j.joi.2016.01.006
Chesbrough, H. W. (2003). Open innovation. Harvard Business School Press.
Choi, J., Yi, S., & Lee, K. C. (2011). Analysis of keyword networks in MIS research and implications for predicting knowledge evolution. Information & Management, 48(8), 371–381. https://doi.org/10.1016/j.im.2011.09.004
Cobo, M. J., López-Herrera, A. G., Herrera-Viedma, E., & Herrera, F. (2011). Science mapping software tools: Review, analysis, and cooperative study among tools. Journal of the American Society for Information Science and Technology, 62(7), 1382–1402. https://doi.org/10.1002/asi.21525
Cohen, W. M., & Levinthal, D. A. (1990). Absorptive capacity: A new perspective on learning and innovation. Administrative Science Quarterly, 35(1), 128–152. https://doi.org/10.2307/2393553
de la Hoz-Correa, A., Muñoz-Leiva, F., & Bakucz, M. (2018). Past themes and future trends in medical tourism research: A co-word analysis. Tourism Management, 65, 200–211. https://doi.org/10.1016/j.tourman.2017.10.001
Deyanova, K., Brehmer, N., Lapidus, A., Tiberius, V., & Walsh, S. (2022). Hatching start-ups for sustainable growth: A bibliometric review on business incubators. Review of Managerial Science, 16(7), 2083–2109. https://doi.org/10.1007/s11846-022-00525-9
Donthu, N., Kumar, S., Mukherjee, D., Pandey, N., & Lim, W. M. (2021). How to conduct a bibliometric analysis: An overview and guidelines. Journal of Business Research, 133, 285–296. https://doi.org/10.1016/j.jbusres.2021.04.070
Dotsika, F., & Watkins, A. (2017). Identifying potentially disruptive trends by means of keyword network analysis. Technological Forecasting and Social Change, 119, 114–127. https://doi.org/10.1016/j.techfore.2017.03.020
Duriau, V. J., Reger, R. K., & Pfarrer, M. D. (2007). A content analysis of the content analysis literature in organization studies: research themes, data sources, and methodological refinements. Organizational Research Methods, 10(1), 5–34. https://doi.org/10.1177/109442810628925
Fiaz, S., & Qureshi, M. A. (2021). How perceived organizational politics cause work-to-family conflict? Scoping and systematic review of literature. Future Business Journal, 7(1), 5. https://doi.org/10.1186/s43093-020-00046-5
Glänzel, W., Moed, H. F., Schmoch, U., & Thelwall, M. (2019). Springer handbook of science and technology indicators. Springer.
Hamers, L., Hemeryck, Y., Herweyers, G., Janssen, M., Keters, H., Rousseau, R., & Vanhoutte, A. (1989). Similarity measures in scientometric research: The Jaccard index versus Salton’s cosine formula. Information Processing & Management, 25(3), 315–318. https://doi.org/10.1016/0306-4573(89)90048-4
He, Q. (1999). Knowledge discovery through co-word analysis. Library Trends, 48(1), 133–159.
Herrera-Viedma, E., Martinez, M. A., & Herrera, M. (2016). Bibliometric Tools for Discovering Information in Database. In H. Fujita, M. Ali, A. Selamat, J. Sasaki, & M. Kurematsu (Eds.), Trends in Applied Knowledge-Based Systems and Data Science (Vol. 9799, pp. 193–203). Springer International Publishing. https://doi.org/10.1007/978-3-319-42007-3_17
Hickman, L., Thapa, S., Tay, L., Cao, M., & Srinivasan, P. (2022). Text preprocessing for text mining in organizational research: review and recommendations. Organizational Research Methods, 25(1), 114–146. https://doi.org/10.1177/1094428120971683
Huang, Y., Ding, X.-H., Liu, R., He, Y., & Wu, S. (2019). Reviewing the domain of technology and innovation management: A visualizing bibliometric analysis. SAGE Open, 9(2), 2158244019854644. https://doi.org/10.1177/2158244019854644
Kiani Mavi, R., Kiani Mavi, N., Olaru, D., Biermann, S., & Chi, S. (2022). Innovations in freight transport: A systematic literature evaluation and COVID implications. The International Journal of Logistics Management, 33(4), 1157–1195. https://doi.org/10.1108/IJLM-07-2021-0360
Krippendorff, K. (2018). Content analysis: An introduction to its methodology (4th edition). Sage Publications.
Leydesdorff, L., & Hellsten, I. (2006). Measuring the meaning of words in contexts: An automated analysis of controversies about ‘Monarch butterflies’, ‘Frankenfoods’, and ‘stem cells.’ Scientometrics, 67(2), 231–258. https://doi.org/10.1007/s11192-006-0096-y
Leydesdorff, L., & Welbers, K. (2011). The semantic mapping of words and co-words in contexts. Journal of Informetrics, 5(3), 469–475. https://doi.org/10.1016/j.joi.2011.01.008
Li, B., & Han, L. (2013). Distance weighted cosine similarity measure for text classification. In H. Yin, K. Tang, Y. Gao, F. Klawonn, M. Lee, T. Weise, B. Li, & X. Yao (Eds.), Intelligent data engineering and automated learning—IDEAL 2013 (pp. 611–618). Berlin Heidelberg: Springer.
De Meo, P., Ferrara, E., Fiumara, G., & Provetti, A. (2011, 22–24 Nov. 2011). Generalized Louvain method for community detection in large networks. 2011 11th International Conference on Intelligent Systems Design and Applications,
Meyer-Broetz, F., Stelzer, B., Schiebel, E., & Brecht, L. (2018). Mapping the technology and innovation management literature using hybrid bibliometric networks. International Journal of Technology Management, 77(4), 235–286. https://doi.org/10.1504/ijtm.2018.092973
Mora-Valentín, E.-M., Ortiz-de-Urbina-Criado, M., & Nájera-Sánchez, J.-J. (2018). Mapping the conceptual structure of science and technology parks. The Journal of Technology Transfer, 43(5), 1410–1435. https://doi.org/10.1007/s10961-018-9654-8
Muñoz-Leiva, F., Sánchez-Fernández, J., Liébana-Cabanillas, F. J., & Martínez-Fiestas, M. (2013). Detecting salient themes in financial marketing research from 1961 to 2010. The Service Industries Journal, 33(9–10), 925–940. https://doi.org/10.1080/02642069.2013.719884
Muñoz-Leiva, F., Porcu, L., & Barrio-García, S. (2015). Discovering prominent themes in integrated marketing communication research from 1991 to 2012: A co-word analytic approach. International Journal of Advertising, 34(4), 678–701. https://doi.org/10.1080/02650487.2015.1009348
Neuendorf, K. (2017). The Content Analysis Guidebook (Second ed.). Sage Publications Inc. https://doi.org/10.4135/9781071802878
Ortiz-de-Urbina-Criado, M., Nájera-Sánchez, J.-J., & Mora-Valentín, E.-M. (2018). A research agenda on open innovation and entrepreneurship: A Co-word analysis. Administrative Sciences, 8(3), 34. https://doi.org/10.3390/admsci8030034
Peters, H. P., & van Raan, A. F. (1993). Co-word-based science maps of chemical engineering. Part I: Representations by direct multidimensional scaling. Research Policy., 22(1), 23–45.
Ratten, V., Ferreira, J. J., & Fernandes, C. I. (2017). Innovation management - current trends and future directions. International Journal of Innovation and Learning, 22(2), 135–155. https://doi.org/10.1504/ijil.2017.085916
Rha, J. S., & Lee, H.-H. (2022). Research trends in digital transformation in the service sector: A review based on network text analysis. Service Business, 16(1), 77–98. https://doi.org/10.1007/s11628-022-00481-0
Salton, G., & McGill, M. J. (1983). Introduction to modern information retrieval. McGraw-Hill Book Co.
Santana, M., & Lopez-Cabrales, A. (2019). Sustainable development and human resource management: A science mapping approach. Corporate Social Responsibility and Environmental Management, 26(6), 1171–1183. https://doi.org/10.1002/csr.1765
Short, J. C., Broberg, J. C., Cogliser, C. C., & Brigham, K. H. (2010). Construct validation using computer-aided text analysis (CATA): An illustration using entrepreneurial orientation. Organizational Research Methods, 13(2), 320–347. https://doi.org/10.1177/1094428109335949
Shrivastava, P., Ivanaj, S., & Ivanaj, V. (2016). Strategic technological innovation for sustainable development. International Journal of Technology Management, 70(1), 76–107. https://doi.org/10.1504/IJTM.2016.074672
Tang, S., Wang, D., & Hou, J. (2024). Can Big Data Always Help in Informetric Research? Investigation from 2000 to 2021. Preprint, not reviewed. Available at SSRN, 1–19. https://doi.org/10.2139/ssrn.4731733
Thijs, B. (2019). Science mapping and the identification of topics: Theoretical and methodological considerations. In Springer handbook of science and technology indicators. Springer. pp. 213–233
Valtonen, L., Mäkinen, S. J., & Kirjavainen, J. (2024). Advancing reproducibility and accountability of unsupervised machine learning in text mining: Importance of transparency in reporting preprocessing and algorithm selection. Organizational Research Methods, 27(1), 88–113.
van Raan, A. (2019). Measuring science: Basic principles and application of advanced bibliometrics. In W. Glänzel (Ed.), Springer handbook of science and technology indicators (pp. 237–280). Springer.
Wang, M., & Chai, L. (2018). Three new bibliometric indicators/approaches derived from keyword analysis. Scientometrics, 116(2), 721–750. https://doi.org/10.1007/s11192-018-2768-9
Yan, X., Jeub, L. G. S., Flammini, A., Radicchi, F., & Fortunato, S. (2018). Weight thresholding on complex networks. Physical Review E, 98(4), 042304. https://doi.org/10.1103/PhysRevE.98.042304
Zhang, J., Yu, Q., Zheng, F., Long, C., Lu, Z., & Duan, Z. (2016). Comparing keywords plus of WOS and author keywords: A case study of patient adherence research. Journal of the Association for Information Science and Technology, 67(4), 967–972. https://doi.org/10.1002/asi.23437
Zipf, G. K. (1949). Human behavior and the principle of least effort. Addison-Wesley Press.
Zupic, I., & Čater, T. (2015). Bibliometric methods in management and organization. Organizational Research Methods, 18(3), 429–472. https://doi.org/10.1177/1094428114562629
Acknowledgements
The authors would like to thank Philippe Mouricou for the comments and suggestions to the working paper and the anonymous reviewers for their work and constructive criticism of the submitted manuscript.
Funding
No funds, grants, or other support was received. The authors have no relevant financial or non-financial interest to disclose.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
We have no conflict of interest to disclose.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Borsi, B., Vida, Z. & Soós, S. Keyword standardization and restructuring: the impact on analysing network-based science maps in innovation management research. Scientometrics 130, 593–617 (2025). https://doi.org/10.1007/s11192-025-05232-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-025-05232-2
Keywords
- Content analysis
- Science mapping
- Bibliometric networks
- Keyword preprocessing
- Keyword content analysis
- Coword analysis
- Keyword standardization and restructuring (KSR)
- Innovation management
Mathematics Subject Classification
- 91D30 social networks
- Opinion dynamics
- 91C20 clustering in the social and behavioral sciences
- 68T30 knowledge representation