Skip to main content
Log in

Measuring cognitive proximity using semantic analysis: A case study of China's ICT industry

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

Quantification of knowledge technologies has long posed a challenge to the measurement of cognitive proximity. This paper proposes a method to measure cognitive proximity by mining patent description text with the LDA topic model. With the patent-topic distribution got from the LDA topic model, the cognitive proximity is measured between enterprises or within cities, which could make up for the shortage of existing measurement methods limited by the rigid IPC, industry classification system, or non-standard interview data. Our empirical studies on the ICT industry indicate that the 20 topics obtained through the topic model have a good correspondence with the technologies involved in this industry's leading products and services. And we dig out the knowledge and technology information in the patent text to depict the technology landscape, including mining the changes of technology topics over time, the difference of distribution in various cities, and the development trend of the urban innovation network. This method's effectiveness is also proved in the model that compares different measurement methods when revealing the relationship between cognitive proximity and patent productivity. Last, researchers can use this approach to delve deeper into urban innovation issues, and policymakers can use it to figure out further innovation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data availability

The patent data and Statistical Yearbook Data used in this paper are all public data, which can be obtained from the official website. Individual data from China's Economic Census are not public.

Code availability

The semantic analysis code contains technical processing details and cannot be disclosed at this time.

References

Download references

Funding

The research leading to these results received funding from the National Natural Science Foundation of China under Grant Agreement No. 41971157.

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study conception and design. Yawen Qin performed data collection and analysis. Yawen Qin wrote the first draft of the manuscript, and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Xun Li.

Ethics declarations

Conflict of Interests

The authors declare they don't have any commercial or associative interest that represents a conflict of interest connected with the work submitted.

Appendix

Appendix

Topics and featured words

The result of the topic model includes the mapping relationship between topic and feature words. Generally, each topic contains multiple feature words, so each feature word's weight is relatively low. Table 3 lists the top 5 with the highest weight. We can name the theme according to the combination of feature words. For example, the featured words in topic0 are all related to battery technology, so we can define topic0 as battery.

Table 3 Topic-featured words distribution

Diversity and specialization of cities' innovation topics

The normalized information entropy of each city on 20 topics is calculated by using the city-topic distribution. Entropy reflects the diversity of cities in the process of innovation. The higher the entropy value is, the more diverse it is the lower the entropy value is, the more specialized it is. Figure 10 reflects the relationship between entropy value and patent output of all cities with patents in ICT industry. However, most cities haven't form ICT industry clusters, so this paper focuses on 26 cities with relatively continuous and high patent output, and the results are reflected in Fig. 11.

Fig. 10
figure 10

Correlation between patent output and innovation entropy

Fig. 11
figure 11

Correlation between patent output and innovation entropy in 26 cities

Among the 26 cities, Nantong, Ningbo, and other cities in the Yangtze River delta are the most specialized. In contrast, Shenzhen, Beijing, and other cities have the most diversified ICT industry innovations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Qin, Y., Qin, X., Chen, H. et al. Measuring cognitive proximity using semantic analysis: A case study of China's ICT industry. Scientometrics 126, 6059–6084 (2021). https://doi.org/10.1007/s11192-021-04021-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-021-04021-x

Keywords

Mathematics Subject Classification (2020)

JEL Classification

Navigation