Skip to main content
Log in

Keyword standardization and restructuring: the impact on analysing network-based science maps in innovation management research

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

Content analysis with the use of keywords experiences a take-off period in science mapping. Within the family of co-word analyses, analyzing the keyword content is commonly preceded by computer-assisted preprocessing, which may leave substantial noise and bias in the structure of the network set up. Despite these flaws, only few articles have attempted to go beyond conventional keyword standardization steps, although leveraging expert knowledge holds the promise to reduce the tradeoff between interpretability and representativeness in scaled bibliometric studies. We propose systematic manual preprocessing, an algorithmic keyword standardization and restructuring (KSR) procedure, and the paper is a validation study of the method. The innovation management (IM) disciplinary area is used to demonstrate the extent to which the quality and interpretability of bibliometric networks change and improve if in-depth keyword standardization and restructuring is implemented. For the demonstration, two networks of more than 5000 articles were set up and analyzed using identical steps, keyword preprocessing being the only difference. The impact of the KSR procedure on the clusterings is considerable and interpretation is greatly affected. Recommendations have been compiled for researchers, who would like to build keyword-based science maps to analyze content.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. Namely, bibliographic coupling and co-citation analyses.

  2. For building science maps, the other three tools are co-author, bibliographic coupling and co-citation analyses. Computer-assisted textual analysis comprises a diverse set of methods, the overview of which is beyond the scope of this article. Word frequency counts, wordclouds, sentiment analysis, topic modelling, text classification, Natural Language Processing etc. are illustrative examples of textual analysis. These methods are used in practice-oriented works as well as analysis and academic research.

  3. Organization is a so-called polysemic term.

  4. Three search strings were combined for the management and business domains in Web of Science. 1. (innovation* AND keyword* AND “*network*” AND (“author*” OR “article*” OR “researcher*” OR “scholar*” OR “scientist*”)), 2. (innovation AND (“keyword* content analys*” OR (keyword* AND (cooccur* OR co-occur)) OR (coword OR co-word))), 3. (innovation* AND keyword* AND (cooccur* OR co-occur* OR coword OR co-word)). In relevant articles keyword cooccurence must have been analysed in a network approach.

  5. Such as smart city, the systemic view of entrepreneurship supply chains and business models.

  6. The search profile in Table A3 of the Appendices can be used to download the metadata from the Web of Science database.

  7. The latter is also a valid approach. Maintaining focus on content analysis, hence the effort to remove insignificant content representations from the network. Cosine similarity ranges from 0 to 1, where 1 indicates the strongest similarity.

  8. The principle of omitting the most frequent terms is analogous with the Term Frequency-Inverse Document Frequency (TF/IDF) approach. In our case, at the first clustering step, the global semantic context has already been set and at the second level a more granular semantic structure or context emerges.

  9. Papers not retained by the KSR method, were assigned a separate cluster.

  10. Papers not retained by any of the keyword similarity methods—the one used for RAW and KSR for the RESTRUCT—were assigned a separate cluster.

  11. Most of the dropout occurs when the characteristic keywords, whose presence would have adverse impact on delineating second level communities, are omitted to enhance discriminatory power of the clustering. This ommission is common practice in setting up co-occurrence science maps. For details, please consult Tables A4a, A4b, A5a and A5b in the Appendices.

  12. For details, please consult Figures A3a and A3b in the Appendices.

References

  • Aguinis, H., Pierce, C. A., Bosco, F. A., & Muslin, I. S. (2009). First decade of organizational research methods: Trends in design, measurement, and data-analysis topics. Organizational Research Methods, 12(1), 69–112. https://doi.org/10.1177/1094428108322641

    Article  Google Scholar 

  • Andersen, N. (2021). Mapping the expatriate literature: A bibliometric review of the field from 1998 to 2017 and identification of current research fronts. The International Journal of Human Resource Management, 32(22), 4687–4724. https://doi.org/10.1080/09585192.2019.1661267

    Article  MATH  Google Scholar 

  • Ball, R. (2021). Handbook Bibliometrics. De Gruyter Saur. https://doi.org/10.1515/9783110646610

  • De Bellis, N. (2014). History and Evolution of (Biblio)Metrics. In Beyond Bibliometrics: Harnessing Multidimensional Indicators of Scholarly Impact (pp. 0). The MIT Press. https://doi.org/10.7551/mitpress/9445.003.0004

  • Blondel, V. D., Guillaume, J.-L., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(10), P10008. https://doi.org/10.1088/1742-5468/2008/10/P10008

    Article  MATH  Google Scholar 

  • Borsi, B., & Soós, S. (2019). Mapping innovation management research on the scientific landscape–report on the research in progress. In I. Bitran, S. Conn, C. Gernreich, M. Heber, E. Huizingh, O. Kokshagina, M. Torkkeli, & M. Tynnhammar (Eds.), Proceedings of The XXX ISPIM Innovation Conference - Celebrating Innovation / 500 Years Since Da Vinci. Lappeenranta-Lahti University of Technology.

  • Borsi, B., Vida, Z., & Soós, S. (2022). IM and KM: Co-evolution of Disciplines. A Keyword Content Analysis (research in progress). Manuscript paper presented at the XXXIII ISPIM Innovation Conference – Innovating in a Digital World, Copenhagen, Denmark, 5–8 June 2022.

  • Boyack, K. W., Klavans, R., & Börner, K. (2005). Mapping the backbone of science. Scientometrics, 64(3), 351–374. https://doi.org/10.1007/s11192-005-0255-6

    Article  MATH  Google Scholar 

  • Callon, M., Courtial, J.-P., & Laville, F. (1991). Co-word analysis as a tool for describing the network of interactions between basic and technological research: The case of polymer chemsitry. Scientometrics, 22(1), 155–205. https://doi.org/10.1007/bf02019280

    Article  MATH  Google Scholar 

  • Chen, G., & Xiao, L. (2016). Selecting publication keywords for domain analysis in bibliometrics: A comparison of three methods. Journal of Informetrics, 10(1), 212–223. https://doi.org/10.1016/j.joi.2016.01.006

    Article  MathSciNet  MATH  Google Scholar 

  • Chesbrough, H. W. (2003). Open innovation. Harvard Business School Press.

    Google Scholar 

  • Choi, J., Yi, S., & Lee, K. C. (2011). Analysis of keyword networks in MIS research and implications for predicting knowledge evolution. Information & Management, 48(8), 371–381. https://doi.org/10.1016/j.im.2011.09.004

    Article  MATH  Google Scholar 

  • Cobo, M. J., López-Herrera, A. G., Herrera-Viedma, E., & Herrera, F. (2011). Science mapping software tools: Review, analysis, and cooperative study among tools. Journal of the American Society for Information Science and Technology, 62(7), 1382–1402. https://doi.org/10.1002/asi.21525

    Article  MATH  Google Scholar 

  • Cohen, W. M., & Levinthal, D. A. (1990). Absorptive capacity: A new perspective on learning and innovation. Administrative Science Quarterly, 35(1), 128–152. https://doi.org/10.2307/2393553

    Article  MATH  Google Scholar 

  • de la Hoz-Correa, A., Muñoz-Leiva, F., & Bakucz, M. (2018). Past themes and future trends in medical tourism research: A co-word analysis. Tourism Management, 65, 200–211. https://doi.org/10.1016/j.tourman.2017.10.001

    Article  Google Scholar 

  • Deyanova, K., Brehmer, N., Lapidus, A., Tiberius, V., & Walsh, S. (2022). Hatching start-ups for sustainable growth: A bibliometric review on business incubators. Review of Managerial Science, 16(7), 2083–2109. https://doi.org/10.1007/s11846-022-00525-9

    Article  Google Scholar 

  • Donthu, N., Kumar, S., Mukherjee, D., Pandey, N., & Lim, W. M. (2021). How to conduct a bibliometric analysis: An overview and guidelines. Journal of Business Research, 133, 285–296. https://doi.org/10.1016/j.jbusres.2021.04.070

    Article  Google Scholar 

  • Dotsika, F., & Watkins, A. (2017). Identifying potentially disruptive trends by means of keyword network analysis. Technological Forecasting and Social Change, 119, 114–127. https://doi.org/10.1016/j.techfore.2017.03.020

    Article  Google Scholar 

  • Duriau, V. J., Reger, R. K., & Pfarrer, M. D. (2007). A content analysis of the content analysis literature in organization studies: research themes, data sources, and methodological refinements. Organizational Research Methods, 10(1), 5–34. https://doi.org/10.1177/109442810628925

    Article  Google Scholar 

  • Fiaz, S., & Qureshi, M. A. (2021). How perceived organizational politics cause work-to-family conflict? Scoping and systematic review of literature. Future Business Journal, 7(1), 5. https://doi.org/10.1186/s43093-020-00046-5

    Article  MATH  Google Scholar 

  • Glänzel, W., Moed, H. F., Schmoch, U., & Thelwall, M. (2019). Springer handbook of science and technology indicators. Springer.

    Book  MATH  Google Scholar 

  • Hamers, L., Hemeryck, Y., Herweyers, G., Janssen, M., Keters, H., Rousseau, R., & Vanhoutte, A. (1989). Similarity measures in scientometric research: The Jaccard index versus Salton’s cosine formula. Information Processing & Management, 25(3), 315–318. https://doi.org/10.1016/0306-4573(89)90048-4

    Article  Google Scholar 

  • He, Q. (1999). Knowledge discovery through co-word analysis. Library Trends, 48(1), 133–159.

    MATH  Google Scholar 

  • Herrera-Viedma, E., Martinez, M. A., & Herrera, M. (2016). Bibliometric Tools for Discovering Information in Database. In H. Fujita, M. Ali, A. Selamat, J. Sasaki, & M. Kurematsu (Eds.), Trends in Applied Knowledge-Based Systems and Data Science (Vol. 9799, pp. 193–203). Springer International Publishing. https://doi.org/10.1007/978-3-319-42007-3_17

  • Hickman, L., Thapa, S., Tay, L., Cao, M., & Srinivasan, P. (2022). Text preprocessing for text mining in organizational research: review and recommendations. Organizational Research Methods, 25(1), 114–146. https://doi.org/10.1177/1094428120971683

    Article  Google Scholar 

  • Huang, Y., Ding, X.-H., Liu, R., He, Y., & Wu, S. (2019). Reviewing the domain of technology and innovation management: A visualizing bibliometric analysis. SAGE Open, 9(2), 2158244019854644. https://doi.org/10.1177/2158244019854644

    Article  Google Scholar 

  • Kiani Mavi, R., Kiani Mavi, N., Olaru, D., Biermann, S., & Chi, S. (2022). Innovations in freight transport: A systematic literature evaluation and COVID implications. The International Journal of Logistics Management, 33(4), 1157–1195. https://doi.org/10.1108/IJLM-07-2021-0360

    Article  MATH  Google Scholar 

  • Krippendorff, K. (2018). Content analysis: An introduction to its methodology (4th edition). Sage Publications.

  • Leydesdorff, L., & Hellsten, I. (2006). Measuring the meaning of words in contexts: An automated analysis of controversies about ‘Monarch butterflies’, ‘Frankenfoods’, and ‘stem cells.’ Scientometrics, 67(2), 231–258. https://doi.org/10.1007/s11192-006-0096-y

    Article  MATH  Google Scholar 

  • Leydesdorff, L., & Welbers, K. (2011). The semantic mapping of words and co-words in contexts. Journal of Informetrics, 5(3), 469–475. https://doi.org/10.1016/j.joi.2011.01.008

    Article  MATH  Google Scholar 

  • Li, B., & Han, L. (2013). Distance weighted cosine similarity measure for text classification. In H. Yin, K. Tang, Y. Gao, F. Klawonn, M. Lee, T. Weise, B. Li, & X. Yao (Eds.), Intelligent data engineering and automated learning—IDEAL 2013 (pp. 611–618). Berlin Heidelberg: Springer.

    Chapter  MATH  Google Scholar 

  • De Meo, P., Ferrara, E., Fiumara, G., & Provetti, A. (2011, 22–24 Nov. 2011). Generalized Louvain method for community detection in large networks. 2011 11th International Conference on Intelligent Systems Design and Applications,

  • Meyer-Broetz, F., Stelzer, B., Schiebel, E., & Brecht, L. (2018). Mapping the technology and innovation management literature using hybrid bibliometric networks. International Journal of Technology Management, 77(4), 235–286. https://doi.org/10.1504/ijtm.2018.092973

    Article  MATH  Google Scholar 

  • Mora-Valentín, E.-M., Ortiz-de-Urbina-Criado, M., & Nájera-Sánchez, J.-J. (2018). Mapping the conceptual structure of science and technology parks. The Journal of Technology Transfer, 43(5), 1410–1435. https://doi.org/10.1007/s10961-018-9654-8

    Article  MATH  Google Scholar 

  • Muñoz-Leiva, F., Sánchez-Fernández, J., Liébana-Cabanillas, F. J., & Martínez-Fiestas, M. (2013). Detecting salient themes in financial marketing research from 1961 to 2010. The Service Industries Journal, 33(9–10), 925–940. https://doi.org/10.1080/02642069.2013.719884

    Article  Google Scholar 

  • Muñoz-Leiva, F., Porcu, L., & Barrio-García, S. (2015). Discovering prominent themes in integrated marketing communication research from 1991 to 2012: A co-word analytic approach. International Journal of Advertising, 34(4), 678–701. https://doi.org/10.1080/02650487.2015.1009348

    Article  Google Scholar 

  • Neuendorf, K. (2017). The Content Analysis Guidebook (Second ed.). Sage Publications Inc. https://doi.org/10.4135/9781071802878

  • Ortiz-de-Urbina-Criado, M., Nájera-Sánchez, J.-J., & Mora-Valentín, E.-M. (2018). A research agenda on open innovation and entrepreneurship: A Co-word analysis. Administrative Sciences, 8(3), 34. https://doi.org/10.3390/admsci8030034

    Article  Google Scholar 

  • Peters, H. P., & van Raan, A. F. (1993). Co-word-based science maps of chemical engineering. Part I: Representations by direct multidimensional scaling. Research Policy., 22(1), 23–45.

    Article  MATH  Google Scholar 

  • Ratten, V., Ferreira, J. J., & Fernandes, C. I. (2017). Innovation management - current trends and future directions. International Journal of Innovation and Learning, 22(2), 135–155. https://doi.org/10.1504/ijil.2017.085916

    Article  MATH  Google Scholar 

  • Rha, J. S., & Lee, H.-H. (2022). Research trends in digital transformation in the service sector: A review based on network text analysis. Service Business, 16(1), 77–98. https://doi.org/10.1007/s11628-022-00481-0

    Article  MATH  Google Scholar 

  • Salton, G., & McGill, M. J. (1983). Introduction to modern information retrieval. McGraw-Hill Book Co.

    MATH  Google Scholar 

  • Santana, M., & Lopez-Cabrales, A. (2019). Sustainable development and human resource management: A science mapping approach. Corporate Social Responsibility and Environmental Management, 26(6), 1171–1183. https://doi.org/10.1002/csr.1765

    Article  MATH  Google Scholar 

  • Short, J. C., Broberg, J. C., Cogliser, C. C., & Brigham, K. H. (2010). Construct validation using computer-aided text analysis (CATA): An illustration using entrepreneurial orientation. Organizational Research Methods, 13(2), 320–347. https://doi.org/10.1177/1094428109335949

    Article  Google Scholar 

  • Shrivastava, P., Ivanaj, S., & Ivanaj, V. (2016). Strategic technological innovation for sustainable development. International Journal of Technology Management, 70(1), 76–107. https://doi.org/10.1504/IJTM.2016.074672

    Article  Google Scholar 

  • Tang, S., Wang, D., & Hou, J. (2024). Can Big Data Always Help in Informetric Research? Investigation from 2000 to 2021. Preprint, not reviewed. Available at SSRN, 1–19. https://doi.org/10.2139/ssrn.4731733

  • Thijs, B. (2019). Science mapping and the identification of topics: Theoretical and methodological considerations. In Springer handbook of science and technology indicators. Springer. pp. 213–233

  • Valtonen, L., Mäkinen, S. J., & Kirjavainen, J. (2024). Advancing reproducibility and accountability of unsupervised machine learning in text mining: Importance of transparency in reporting preprocessing and algorithm selection. Organizational Research Methods, 27(1), 88–113.

    Article  Google Scholar 

  • van Raan, A. (2019). Measuring science: Basic principles and application of advanced bibliometrics. In W. Glänzel (Ed.), Springer handbook of science and technology indicators (pp. 237–280). Springer.

    Chapter  MATH  Google Scholar 

  • Wang, M., & Chai, L. (2018). Three new bibliometric indicators/approaches derived from keyword analysis. Scientometrics, 116(2), 721–750. https://doi.org/10.1007/s11192-018-2768-9

    Article  MATH  Google Scholar 

  • Yan, X., Jeub, L. G. S., Flammini, A., Radicchi, F., & Fortunato, S. (2018). Weight thresholding on complex networks. Physical Review E, 98(4), 042304. https://doi.org/10.1103/PhysRevE.98.042304

    Article  MATH  Google Scholar 

  • Zhang, J., Yu, Q., Zheng, F., Long, C., Lu, Z., & Duan, Z. (2016). Comparing keywords plus of WOS and author keywords: A case study of patient adherence research. Journal of the Association for Information Science and Technology, 67(4), 967–972. https://doi.org/10.1002/asi.23437

    Article  MATH  Google Scholar 

  • Zipf, G. K. (1949). Human behavior and the principle of least effort. Addison-Wesley Press.

  • Zupic, I., & Čater, T. (2015). Bibliometric methods in management and organization. Organizational Research Methods, 18(3), 429–472. https://doi.org/10.1177/1094428114562629

    Article  MATH  Google Scholar 

Download references

Acknowledgements

The authors would like to thank Philippe Mouricou for the comments and suggestions to the working paper and the anonymous reviewers for their work and constructive criticism of the submitted manuscript.

Funding

No funds, grants, or other support was received. The authors have no relevant financial or non-financial interest to disclose.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Balázs Borsi.

Ethics declarations

Conflict of interest

We have no conflict of interest to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 3430 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Borsi, B., Vida, Z. & Soós, S. Keyword standardization and restructuring: the impact on analysing network-based science maps in innovation management research. Scientometrics 130, 593–617 (2025). https://doi.org/10.1007/s11192-025-05232-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-025-05232-2

Keywords

Mathematics Subject Classification

JEL Classification