1 Introduction

At the start of China’s reform and opening up, Deng Xiaoping highlighted that “it is very arduous to develop the economy without opening up.” [1] Over the past 40 years of reform and opening up, China’s opening-up policy has transformed theoretically to keeping pace with the times, and its connotation has been constantly enriched and improved in important conferences [2]. Since entering the new era of building socialism with Chinese characteristics, the Fifth Plenary Session of the 18th Central Committee of the Communist Party of China (CPC) has adhered to a people-centered development thought and put forward a development concept comprising innovation, coordination, green environment, openness, and sharing [3]. Open development, as one of the five development concepts for the medium and long-term development of China, provides a guide to promoting the transformation and upgrading of China’s open economy and high-quality development. The Fourth Plenary Session of the 19th Central Committee of the CPC clearly stated that we should “build a new open economic system with a higher level, and implement a comprehensive opening up in a wider range, wider fields and deeper levels.” [4].

However, the opening-up policy is a double-edged sword. Although it promotes high-quality development of China’s economy, it also brings a series of challenges to the prevention and control of infectious diseases. In an open, inter-provincial environment, the probability of inter-provincial infection and difficulty of epidemic prevention and control both increase. The outbreak of COVID-19 in Wuhan around the Spring Festival of 2020 and rapid spread across the country is closely related to the free flow of various elements in the context of inter-provincial openness. With an inter-provincial openness index system as basis, this study follows the law of epidemic transmission and proposes hypotheses regarding the relationship between inter-provincial openness and epidemic spread, and through empirical research, validates and analyzes the structural elements of epidemic transmission in the context of inter-provincial openness. It then draws conclusions on the relationship between inter-provincial openness and the spread of the epidemic.

2 Literature review

Currently, most of the experts and scholars worldwide in the fields of epidemiology and statistics (among others) are studying the factors of epidemic spread. The methods used are mainly based on epidemiological statistical methods, and the studies are mainly quantitative and empirical. Related research can be classified into the following categories.

The first category includes general studies of the factors of epidemic transmission. For instance, Yang [5] constructed a social vulnerability evaluation index system and conducted a general analysis of the factors causing the outbreak of major infectious diseases, while Zhang et al. [6], using quantitative and spatial simulation analyses, discussed a model framework regarding the impact of urbanization on the spread of infectious diseases.

The second category includes studies that focused on the specific factors concerning epidemics. Qiaohong et al. [7] formulated technical guidelines for the investigation, prevention, and control of the norovirus infection in 2015 and found that its transmission routes included human-to-human transmission, and food and water sources, which have many risk factors. Sharia et al. [8] summarized sporadic cases and outbreaks of norovirus infectious diarrhea from 1997 to 2011 and found that 79% of the cases and 71% of the outbreaks occurred during the cold season [8], suggesting that epidemic spread is related to seasonality. Xiang et al. [9] examined human avian influenza cases in China from 2005 to 2009 and also found that human avian influenza cases had obvious seasonal distribution characteristics. Wang [10] analyzed the peak of the intestinal epidemic in Hebei Province from 1958 to 1963 and concluded that the outbreak and prevalence of the epidemic were caused by multiple factors, among which, the role of social factors was far greater than that of natural factors. Therefore, epidemic disease is not only a physiological phenomenon of individuals, but also a social problem that is closely related to economic and social development, living customs, natural environmental changes, transportation, and international exchanges [10]. Guoning et al. [11] believed that temporal and spatial changes in the epidemic situation are affected by natural meteorological and socioeconomic factors [11]. This broadens the perspective of the analysis of the factors of epidemic spread, which point to natural and social factors.

The third category includes studies that examined the influence of government prevention and control behavior on the spread of epidemics. Li et al. [12], based on game theory, established a spread and diffusion equation of the human avian influenza epidemic and tested the data related to the prevention and control of H7N9 in China in 2013. They believed that the transmission route of the human avian influenza is complex, the necessary form of epidemic prevention and control is severe, and government intervention is undoubtedly significant to the effective control of the epidemic [12]. Youquan [13] conducted a study on the intervention behavior of the British government in the 1745–1758 outbreak of rinderpest and analyzed and summarized the subjective mistakes and objective limitations of the British government in their response to the epidemic. In 2002, the European Union promulgated the general food law, based on which food and animal health risks are analyzed and controlled from the government’s perspective [14]. The government plays a crucial role in the prevention and control of epidemics. In reality, the government’s behavior is an important factor in curtailing the spread of the epidemic.

The fourth category includes studies that conducted modeling and analysis of factors affecting epidemic spread based on multidisciplinary knowledge. With the improvement of mathematical modeling methods, the development of computer technology, and the application of big data mining technology, the effective combination of empirical data and theoretical models can more effectively reflect real-world situation. Zhongyuan [15] analyzed some important empirical results on epidemic transmission through complex networks over the past 20 years. The findings have deepened our understanding of epidemic transmission in the real world from the perspective of complex networks, making it possible to predict and control the spread of epidemics [15]. McCluskey [16] analyzed the occurrence, transmission rules, and factors affecting infectious diseases using the infectious disease transmission model. Zhang [17] added a latent period and “random reconnection” mechanism into an adaptive network and proposed an infectious disease transmission model in a dynamic social network. These analyses all involve factors that influence the spread of epidemics.

COVID-19 is still ongoing, and studies on the pneumonia epidemic have also progressed rapidly. As of 12 o’clock on May 13, 2021, an Internet search for “COVID-19” on China National Knowledge Infrastructure (CNKI) provided 19,700 published Chinese documents. These documents are mainly divided into three categories. The first category is the introduction of new prevention and control measures and action taken by the local government, mainly from newspapers and other documents. For example, the COVID-19 Prevention and Control Leading Group Office (2021) issued a circular to explain how to manage prevention and control work for COVID-19 during a specific period in Zhengzhou [18]. The second category is a COVID-19 preview triage of drug use, prevention and control guidelines, and psychological aid literature. For example, Yuan et al. [19] analyzed the proper usage and regulation of COVID-19 medicine. The third category involves putting forward prevention and control measures for pneumonia transmission based on empirical or qualitative analyses. For example, Ouyang et al. [20] proposed measures to deal with the rapid spread of COVID-19 in Wuhan. These documents did not involve an analysis of the factors affecting the spread of COVID-19.

We found that existing studies have analyzed the spread of the epidemic from both natural and social aspects, including factors such as climate, temperature, water resources, and environmental changes, as well as economic and political factors, and living customs. Existing research has enriched knowledge on epidemic transmission and prevention and control, and has played an important role in guiding the practice of epidemic prevention and control. However, extant research on the social factors affecting the spread of epidemics is not comprehensive and in-depth, and the related analyses are not systematic. Under conditions of inter-provincial openness, great changes have taken place in the behaviors of individuals, organizations, and governments. Openness accelerates mobility of resource elements and has a specific impact on the spread of the epidemic. Therefore, it is necessary to reexamine epidemic transmission with a different perspective and include into the analysis framework “openness” indicators that affect the spread of the epidemic, to enrich our understanding of the structural elements involved in epidemic transmission.

3 Data collection, measurement, and methodology

To carry out this study, we must first make clear the choice of variables, the construction of index system, the source of data collection and the use of measurement methods. In this part, we will analyze these problems.

3.1 Dependent variable selection and theoretical hypotheses.

Since the end of 2019, COVID-19 has spread rapidly in China. See Fig. 1 below for the distribution of confirmed cases in each province. The figure reveals obvious differences in the number of confirmed cases. What factors have affected the spread of the epidemic and led to such large differences in the diagnosis rate among provinces? We answer this question from the perspective of inter-provincial openness.

Fig. 1
figure 1

The epidemic situation in China’s provinces

The indicators for measuring the spread of COVID-19 mainly include confirmed and suspected cases, and the number of isolated observations. As the COVID-19 epidemic prevention and control situation has changed, the number of clinically diagnosed cases was included in the index system that captures the epidemic’s spread. Among these indicators, the number of confirmed patients is a relatively good indicator and is suitable as a dependent variable. The numbers of suspected cases and isolated observations are affected by many aspects and are greatly uncertain; thus, they are not suitable as dependent variables for this study. Although the number of confirmed cases is also affected by various factors, such as diagnosis and treatment methods, admission conditions, and so on, it is generally more certain. Considering the incidence rate as a measure, we can scientifically reflect the differences in epidemic situations in different provinces. Therefore, this study takes the incidence rate as a dependent variable for analysis. The incidence rate is a dependent variable, and the difference in the flow rate between the infection’s sources (virus-carrying people) is undoubtedly an important factor causing the difference in the incidence rate in different provinces.

Current research shows that the main transmission routes of COVID-19 are through droplets and contact. According to the emergency response law of the People’s Republic of China, public emergencies are mainly divided into four categories: natural disasters, accidents, public health events, and social security incidents [21]. The epidemic situation of infectious diseases is a typical public health event, and its spread comprises three links: source of infection, route of transmission, and susceptible population. The source of infection of COVID-19 is mainly people who carry the virus, and its spread is mainly through droplet transmission and contact between people. In the context of inter-provincial opening, we assume that some indicators closely related to inter-provincial openness constitute the factors affecting the spread of this epidemic. These “openness” factors contribute to the spread of COVID-19 in two ways: the source of infection and the route of transmission, and the degree of the epidemic’s spread also differs depending on the degree of openness between provinces. Based on certain assumptions, combined with the understanding of inter-provincial openness, we constructed and verified a schematic diagram of the transmission path and factors affecting the spread of COVID-19. A schematic diagram is shown in Fig. 2.

Fig. 2
figure 2

The transmission path and factors affecting the spread of COVID-19 in the context of inter-provincial openness

3.2 The inter-provincial openness index system: design and selection of factors

In general, “openness” means opening up to the world. When evaluating a province’s degree of openness, it generally refers to the province’s openness to the world. However, building a new system for a higher-level, open economy means opening up domestic and international markets, making efficient use of domestic and international resources, forming a unified and open modern market system with orderly competition, giving full play to the decisive role of the market regarding resource allocation, greater involvement from the government, and promotion of an orderly and free flow of domestic and international resources and efficient global allocation [22]. Therefore, opening up includes not only the opening of a country or region to foreign countries, but also the opening up of different domestic regions. In addition, “if the vertical relationship between local governments is mainly political and administrative, then the horizontal relationship between local governments is mainly of economic significance” [23]. That is, although the content of opening up is very rich, “economic work is the central work” [24]. Local governments take economic opening up as the core of opening up. Since the start of China’s reform and opening up, with the adjustment of inter-governmental power and responsibility relations, China’s intergovernmental relations have gradually tended toward a network model [25], and the horizontal and oblique links between local governments have been vigorously developed, thereby greatly expanding the scope and level of mutual openness between local governments in China. The concept of open development includes active, two-way, fair, comprehensive, and cooperative openness, and other important ideas [26]. Therefore, openness is a distinctive feature of mutual openness, and the free flow of various elements and resources is an inevitable requirement for mutual openness.

For a long time, China’s opening-up policy has been comprehensive, all-round, and wide-ranging. It includes both opening up to the outside world and domestically, that is, the opening up of various regions and departments in China. In terms of content, openness to the outside world includes openness of the economy, politics, science, technology, and culture [27]. Opening up should “focus on economic opening up and promote open cooperation in various fields such as politics, military, and social culture, so as to achieve balanced opening up” [28]. This suggests that opening up is mainly economic, but also covers other aspects. At present, research on the index system of the openness of provincial administrative regions echoes the main content of economic openness, mainly focusing on the content of the economic index system of opening up provincial administrative regions, and also covers some other content. For example, Wang [29] selected the basis, structure and scale, quality and efficiency, and the potential of opening up as the four first-level indicators of the index system. Combined with the economic development of Zhejiang Province, Wang [29] selected 22 secondary indicators to support the first-level indicators, and used the entropy method to calculate and analyze the economic openness indicators of 11 cities in Zhejiang Province. Sun et al. [30] believed that the evaluation indicators of an open economy are mainly divided into two categories: one is based on the economic opening-up rules (or economic opening-up policies), and the other is based on the results of economic opening up. Based on the analysis, the evaluation indicators for opening up the economy to the outside and inside are constructed [30]. Cheng et al. [31] built an inland economic openness index system that included three target layers, six first-class indicators, and 21 secondary indicators, and performed a comparative analysis of Chongqing’s economic openness from vertical and horizontal levels using principal component analysis. Based on their analysis, Wang et al. [32] constructed an index system of urban innovation culture. In addition to research on the provincial-level administrative region openness indicator system, research on the inter-provincial openness indicator system is mainly focused on the economic openness level. For example, Li et al. [33] combined expert scoring and principal component analysis to construct the openness indexes for the total economic, domestic, import, and global levels from three dimensions: opening up to the inside, bringing in, and going out, which were used to measure how open the economy is in China’s 31 provinces (autonomous regions and municipalities) from 2003 to 2012. Yang et al. [34] examined the possible regional heterogeneity and spatial homogeneity of industrial green innovation under the unbalanced development of inter-provincial openness in China, and constructed an evaluation index system for inter-provincial openness, including five first-level indicators and five second-level indicators.

Inter-provincial openness mainly refers to the opening up of each provincial administrative region within a country, including economic, social, and cultural openness, and openness in terms of science and technology and personnel flow. Openness must be accomplished in two aspects: “One is opening to the outside world, and the other is to invigorate the domestic economy. Reform is to invigorate, and to invigorate domestically is to open to the inside. In fact, they are all called open policies” [35]. Therefore, inter-provincial openness is an integral part of the country’s overall opening up strategy. Inter-provincial openness is also based on economic openness, but it includes other aspects. Existing studies on the openness of provincial administrative regions and inter-provincial openness index systems have focused on the analysis of economic openness index systems while ignoring the analysis of other levels of openness index systems. We are guided by the strategic goal of coordinating and advancing the overall layout of the “five in one” guideline in the new era, and thereby build an inter-provincial openness index system from five aspects: economic, political, cultural, social, and ecological civilization openness. The construction of the index system is based on the content and characteristics of inter-provincial openness and draws on the analysis of existing relevant literature (Table 1).

Table 1 Index system of inter-provincial openness

The index system of inter-provincial openness (see Table 1) includes five first-level indicators, 13 s-level indicators, and 49 third-level indicators. As shown in Fig. 2, the spread of the epidemic follows certain rules, and not all indicators are related to the spread of the epidemic. Therefore, according to the analysis of the existing literature on the epidemic transmission law, combined with the characteristics and contents of inter-provincial opening up, we selected 11 indicators from 49 three-level indicators, which cover natural and social factors affecting the spread of the epidemic. Among them, the natural factors include the temperature and relative humidity of each province, while the social factors include the proportion of inflow population, inter-provincial distance and each province’s per capita GDP, transportation convenience, urbanization rate, business environment index, passenger transport density, population density, and equity mutual investments. Inter-provincial opening up mainly comprises economic opening up; therefore, these factors are mostly economic. Through correlation analysis, the 11 indicators selected are found to be significantly related to the dependent variable (morbidity) identified in this study.

3.3 Basic information of the variables

Based on the theme of this study, we select the relevant variables (see Table 2 for the definitions). Y is the dependent variable that reflects the severity of the epidemic. Specifically, the higher the value of Y is the more severe is the epidemic’s spread. The inter-provincial difference in the Y value reflects the difference in the epidemic transmission degree among different provinces. X1–X11 are the independent variables that explain the differences in inter-provincial epidemic transmission given the inter-provincial openness, and they further reflect the extent to which the structural factors related to the inter-provincial openness have affected the epidemic transmission. Among them, X1–X9 are social factors, and X10–X11 are natural factors. The main sources of data for this study are described as follows.

  1. 1.

    The proportion of inflow population comes from Baidu Map Insights-Statistical Analysis of Baidu Migration Big Data.

  2. 2.

    Population density, per capita GDP, distance between provincial capitals, passenger transport density, and traffic convenience are calculated from data obtained from the China Statistical Yearbook and relevant government websites.

  3. 3.

    Data on the temperature and relative humidity of each province are from the National Meteorological Information Center.

  4. 4.

    Data on the equity mutual investments between Wuhan and other provinces come from the national enterprise credit information publicity system, which is gathered from enterprise data.

  5. 5.

    All the data in this paper are analyzed using SAS 9.4.

Table 2 Variable names and definitions

4 Results

4.1 Cross-validation of the partial least squares factors

This study focuses on the structural elements of the spread of COVID-19 in the context of inter-provincial openness. Affected by limited conditions, the independent variables have multicollinearity problems, and the sample size is small. Therefore, if ordinary least squares (OLS) is used for the analysis of the above-mentioned independent variables, the estimation results will be invalid, and there will be situations such as over-fitting and unexplainable regression coefficients or prediction results. Meanwhile, if principal component regression (PCR) is used, there may be problems such as insufficient interpretation of the dependent variable. Therefore, considering the subject and independent variables of this study, PLS regression is the most suitable analysis method to use. PLS can regress multiple dependent variables to multiple independent variables simultaneously, which is also applicable when the sample is small, and it can obtain an accurate regression equation. Therefore, PLS has advantages that other regression methods do not have for solving multi-variable regressions of small samples. To use PLS for the analysis, we first need to determine the number of PLS factors. Then, using the SAS software, the results (Table 3) can be obtained using the rounding cross-validation method.

Table 3 Cross-validation of the partial least squares factors

The results of the cross validation show that the root mean square of the sum of the squares of the predicted residuals reached the minimum value (0.779069), which can explain 71.25% of the variance of the independent variables and 76.62% of the variance of the dependent variables. However, to simplify the model, the number of extracted factors should be reduced as much as possible when the difference is not significant.

4.2 Parameter estimation results and analysis of the partial least squares regression model

According to the p value in Table 3, there is no significant difference in the test results when the number of factors is 1. Therefore, this study extracts a factor for the PLS regression analysis, and the final results are shown in Table 4.

Table 4 Parameter estimation results and analysis of the partial least squares regression model

The parameter estimation results show that, except for the negative correlation between the distance from each province to Wuhan and the incidence, the other influential factors are positively correlated with the incidence. Specifically, there is a positive correlation between the proportion of the population flowing into each province from Wuhan and the incidence rate. That is, the higher (lower) the proportion of the population flowing into each province from Wuhan from January 1 to 24, 2020, the higher (lower) the incidence in the respective province. The distance from the provincial capitals to Wuhan is negatively related to the incidence in each province. That is, the farther the provincial capitals are to Wuhan, the lower the incidence, and vice versa. Several other social factors, including each province’s population density, transportation convenience, per capita GDP, urbanization rate, business environment index, passenger transport density, and the equity mutual investments ratio between Wuhan and each province are all positively correlated with the incidence in the respective province. Population density is measured as the ratio of the number of permanent residents of each province to the land area of the province at the end of 2018. Transportation convenience is measured by the number of daily passenger trains (railways and flights) from the capital of each province to Wuhan. The per capita GDP of each province is measured by the province’s 2019 GDP. The urbanization rate of each province is calculated by the ratio of the urban permanent population to the total permanent population of the province at the end of 2018, and the proportion of mutual equity investments is measured by the ratio of the number of mutual equity mutual investments transactions between each province and Wuhan to the number of companies surviving in each province as of October 2019 (10,000). These nine factors are social factors closely related to inter-provincial openness. Meanwhile, the standardized correlation coefficients show that the variables reflecting the economic development level and degree of openness of each province have a greater impact on the incidence rate, which highlights the importance of openness and the level of economic development when looking at the epidemic’s spread from the perspective of inter-provincial openness. The remaining two variables are natural factors, namely temperature and relative humidity, and these two variables are positively correlated with the incidence rate.

The parameter estimation results obtained by constructing the PLS regression model also verify the relationship between some factors that are closely related to inter-provincial openness and the spread of the epidemic. Under conditions of inter-provincial openness, openness and the free flow of various element resources have become distinctive features. Inter-provincial openness has not only strengthened the ties between regions and increased the level of mutual benefit of regional economies, but also brought unprecedented crises to cities, especially mega-cities, which are the leaders of the open economy [36]. Therefore, we must include new factors that can help us scientifically understand the law of epidemic transmission, effectively block the path of epidemic spread, and improve the performance level of epidemic prevention and control.

4.3 Regression test: precision analysis and influence mechanism of the partial least squares regression model

The variable importance in projection (VIP) refers to the importance of the independent variable Xi in explaining the dependent variable Y. If the VIP value is greater than 1, this indicates that the independent variable plays a very important role in explaining the dependent variable. If the VIP value is 0.5–1, this means that the importance of the independent variable in explaining the dependent variable is low, and there is a need to increase the sample size or consider other conditions. If the VIP value is less than 0.5, the independent variable has no meaning in explaining the dependent variable. Therefore, the VIP value of the variable can be used to filter out the variables that have greater contribution to the model. The VIP values of the 11 independent variables in the model are shown in Table 5.

Table 5 Variable importance in projection (VIP) values of the independent variables

Table 5 shows that among the selected variables, the proportion of inflow population to each province (X1) and inter-provincial distance (X3) are the variables most related to the incidence in each province, and their VIP values are 1.24029 and 1.21942, respectively. COVID-19 is caused by people carrying the virus. The main routes of transmission are through droplets and contacts. Hubei became a severe disaster area of the COVID-19 epidemic, and Wuhan was the hardest hit area of Hubei. The difference in the incidence rate during the early stage of the epidemic for each province is closely related to the proportion of the population from Wuhan visiting each province; therefore, this variable has the strongest explanatory power. During the pandemic’s early (later) stages, it was important for the provinces to investigate the population imported from Wuhan (Hubei Province) for a better understanding of the influential factors concerning the inflow of the said population, and use the information to address the source of infection for purposes of prevention and control. The inter-provincial distance (X3) is also very powerful in explaining the incidence. In provinces close to Hubei, such as Jiangxi, Hunan, Chongqing, Anhui, and Henan, the distance between the capital of these provinces and Wuhan, and the incidence of COVID-19 in the province are negatively correlated. Several cities in Hubei Province close to Wuhan (such as Huanggang, Xiaogan, etc.) have become the hardest hit areas of COVID-19, which also shows that distance has a great influence on the incidence. For the cities mentioned, the distance from Wuhan is short, and the flow of various resource elements (human capital elements) between the provinces is more frequent, thereby exacerbating the spread of COVID-19; this finding is in line with the basic law of epidemic spread. According to the law of epidemic transmission, this factor should also be considered for purposes of infection prevention and control.

The VIP values of daily passenger traffic (X4), urbanization rate (X6), and per capita GDP (X5) are also greater than 1, indicating that these three independent variables also have a strong role in explaining the incidence. The number of daily passenger flights reflects the degree of transportation convenience between Wuhan (Hubei) and the provinces (provincial capitals) and the level of convenience for inter-provincial population movement. The more convenient the transportation is the more frequent the inter-provincial contacts and exchanges between Wuhan (Hubei) and the other provinces, including the flow of various resource elements, inter-provincial tourism, and business exchanges between inter-provincial enterprises. A high urbanization rate stimulates the potential for economic development and brings about urban–rural mobility and cross-regional mobility of labor. Therefore, the urbanization rate also shows a significant positive correlation with the incidence rate. GDP is an important indicator of the economic strength and market scale of a country or region. Provinces with high GDP per capita reflect economic strength and market scale, and their open economy has strong ability to attract a large amount of labor and bring about labor mobility; this is why the per capita GDP of each province has a strong explanatory power for the incidence in each province. In other words, a high per capita GDP does not bring about the spread of the epidemic, but provinces with a high per capita GDP are generally more open, with greater population mobility, therefore, the risk of epidemic transmission is greater. Among the three influential factors, the daily passenger transport frequency (X4) relate to the infection source, while the urbanization rate (X6) and per capita GDP (X5) relate to the transmission routes.

In addition, the VIP values of variables such as business environment index (X7), temperature (X10), relative humidity (X11), passenger density (X9), population density (X2), and equity mutual ratio (X8) are all lower than 1, but not below 0.5, indicating that these variables are not very strong in explaining the incidence rate, but they should also be noted. Generally speaking, provinces and regions with a high business environment index have a high level of economic development, and the level of talent and labor force aggregation is high. The incidence rate of this epidemic is relatively high (the VIP value is close to 1). According to the epidemiological transmission law, climate factors have a great impact on the spread of respiratory infectious diseases. For example, SARS in 2003 mainly prevailed during the winter and spring, which shows the general characteristics of seasonal and climatic influences on common respiratory infectious diseases. COVID-19 also broke out in late winter and spring, and had characteristics similar to those of SARS in 2003 from the perspective of season and climate. However, there is no scientific conclusion and basis for exactly what temperature is the most suitable for the spread of COVID-19. Therefore, although the temperature is related to the transmission of the epidemic, its explanatory power is not very strong. The influence of relative humidity is similar to that of temperature, which is a natural factor affecting the spread of the epidemic. Although its explanatory power is not very strong, it also reflects the complexity of epidemic transmission elements under conditions of inter-provincial openness. The correlation between these two natural factors and epidemic spread has also been confirmed in the existing literature on epidemic transmission.

Passenger density reflects the level of activity of passenger transportation on a transportation route and can reflect the degree of population flow in various provinces to a certain extent, but it is not particularly accurate; as such, this variable does not explain the dependent variable very strongly. The VIP value of the variable population density is also lower than 1, possibly because both urban population density and rural population density are not calculated separately when determining the population density. COVID-19 has spread from urban to rural areas. In cities with a high population density, the risk of spreading from person to person is higher. Transmission of the epidemic in rural areas lags behind that in cities, and its prevention and control measures (closed roads and closed villages) can also effectively confine villagers in a given space, and the prevention and control effects are relatively obvious. Thus, the explanatory power of population density on the spread of the epidemic is weaker. The VIP value of the equity mutual investments ratio is the smallest among all the independent variables. Under conditions of inter-provincial openness, the variable of equity mutual investments ratio reflects the actual situation of inter-provincial equity mutual investments. The higher the equity mutual investments ratio, the closer are the economic ties among provinces. However, equity investments and human resource flow are not proportional to each other; as such, the equity index cannot reflect the flow of personnel, nor can it be consistent with the incidence rate. Therefore, the proportion of equity mutual investments in some provinces can explain the transmission of the epidemic well, such as in Beijing, Shanghai, Guangdong, and Zhejiang provinces, while this is not true for some provinces such as Tibet. Except for the two natural factors of temperature (X10) and relative humidity (X11), among the other four social factors related to inter-provincial openness, the equity mutual investments ratio (X8), which reflects the equity investments between Wuhan and each province, relates to the source of infection. The other three factors: the business environment index (X7), passenger transport density (X9), and population density (X2), mainly reflect the degree of population mobility in the province, and therefore relates to the transmission route.

5 Discussion and conclusions

This study uses the PLS regression analysis method, follows the law of epidemic transmission, builds an inter-provincial openness index system, and then analyzes and verifies the composition of the structural elements of epidemic transmission and their impact on epidemic transmission in the context of inter-provincial openness. Among the inter-provincial openness factors that affect the spread of COVID-19, natural factors (e.g., local climate conditions) have less influence, but they also serve as reminders for all localities to prepare and implement epidemic prevention and control measures in advance. The other factors are the proportion of inflow population, inter-provincial distance, population density, urbanization rate, transportation convenience, per capita GDP, business environment index, passenger transport density, and the ratio of mutual equity investments between the two places. These factors affect the spread of the epidemic to varying degrees.

There are three main ways to prevent and control the spread of infectious diseases: by controlling the source of infection, cutting off the route of transmission, and protecting susceptible people. Regarding the control and prevention of COVID-19, in addition to controlling the source of infection and cutting off the route of transmission, provinces have also done a lot of work to protect susceptible people, including letting healthy people stay at home, limiting gatherings, and regulating outside work, business, and school. However, this aspect is similarly conducted by all the provinces and within the same timeframe. The difference lies in the way the various provinces control the source of infection and cut off the transmission route. Therefore, this study focuses more on these two aspects.

Regarding the mechanisms for prevention and control, except for the two natural factors of temperature (X10) and relative humidity (X11), the other social factors related to inter-provincial openness are closely related to the source of infection and the path of transmission in the spread of COVID-19. In addition to the proportion of the population that flows into each province from Wuhan (X1), the inter-provincial distance (X3), number of daily passenger flights (X4), and ratio of equity mutual investments (X8) all indirectly reflect the proportion of the population that flows into each province from Wuhan. Therefore, these four elements relate to the source of COVID-19 infection. The urbanization rate (X6), GDP per capita (X5), business environment index (X7), passenger transport density (X9), and population density (X2) indirectly reflect the degree of population mobility in the province and thus, constitute the transmission route of COVID-19. Based on this analysis, we construct a theoretical framework for the structural elements of epidemic transmission in the context of inter-provincial openness (see Fig. 3 below).

Fig. 3
figure 3

Theoretical framework of the structural elements of epidemic transmission under conditions of inter-provincial openness

Fig. 4
figure 4

Population mobility and incidence rate

Fig. 5
figure 5

Opening and closing & prevention and control

The framework diagram shows the inter-provincial openness indicators that affect the spread of COVID-19. They can be divided into natural and social factors. Except for the two natural factors, the nine social factors all directly or indirectly point to population mobility. Four of them directly or indirectly point to the population of Wuhan flowing into other provinces, and the other five indirectly reflect the situation of population movement within the province. In other words, the difference in the incidence of COVID-19 among different provinces under conditions of inter-provincial openness mainly depends on the proportion of the Wuhan population’s influx to the respective province’s population and the flow of the population within that province. This is consistent with the study’s empirical analysis results and is also in line with the basic laws of epidemic spread Figs. 4 and 5.

Overall, inter-provincial openness brings huge challenges to epidemic prevention and control. The more open the provinces are, the greater the possibility of epidemic spread, which is not only reflected in the source of infection, but also in the way of transmission. This highlights the difficulty of temporarily “reversing” inter-provincial openness by closing the borders during epidemics. Epidemic prevention and control involves two aspects: one is “prevention and control” and the other is “treatment.” Treatment belongs to the scope of medicine, and we will not discuss it here. “Prevention and control” means implementing isolation measures and curtailing inter-provincial openness toward a closed state for the region, to control the spread of the virus more effectively. Implementing epidemic prevention and control measures in an open environment has made us more aware of the importance of closing the borders. In other words, in an open environment, we get a better understanding of the elements of openness, how to manage/reverse inter-provincial openness more effectively during epidemics, or even limit inter-city and city center openness, which is a major issue.

Understanding the elements of isolation may help us better observe how an epidemic spreads. This study considered adding quarantine elements to confirm the impact of the elements of inter-provincial openness on the epidemic from the opposite perspective, and a paradoxical result appeared. We found that the incidence of COVID-19 has a positive correlation with the early control of provincial governments. In other words, the greater the government’s control efforts (starting first-level response time, strong emergency management capabilities, strict traffic control, etc.) are, the higher the incidence rate. This is contrary to common sense. Why is there such a contradictory result? The main reason is that during the early stage of the COVID-19 spread, each province initiated its first-level response based on the severity of the epidemic, and there was almost no difference in terms of timing. After Wuhan announced the closure of the city in the early morning of January 23, the provinces of Zhejiang, Guangdong, and Hunan took the lead in launching a major public health emergency response to COVID-19 on January 23, while in other provinces, the first-level responses were launched successively on January 24–25. In fact, at this time, the population flow from Wuhan to the provinces had already taken place. During the epidemic’s spread, factors such as emergency management capabilities and traffic control were not very different between the provinces. By contrast, the provinces with severe early epidemic experience increased their level of epidemic control.

The situation in Hong Kong confirms this finding. On December 31, 2019, after obtaining a notification from the relevant mainland authorities regarding the situation of COVID-19 in Wuhan, the Health Protection Center of the Hong Kong Department of Health issued a press release on the afternoon of the same day. On January 4, 2020, the Hong Kong Special Administrative Region Government announced a contingency plan for the prevention and control of the COVID-19 and simultaneously activated a “serious” level response. In terms of response time, the time it took Hong Kong to initiate the “severe” level of COVID-19 prevention and control was 19 days earlier than the province that initiated the first level of response in the mainland. Hong Kong has effectively reduced first-generation imported transmission through early prevention and control, thereby greatly reducing its incidence. Hong Kong’s case also confirms the general law of emergency management, that is, “prevention is better than disaster relief.”

Limited by many factors, the design of the inter-provincial openness index system is not perfect, as the sample size for the study is insufficient. For example, the special circumstances of individual provinces affect the VIP value of the independent variable, thereby shifting the importance of a certain factor in the spread of the epidemic. For example, in Zhejiang and Jiangsu, nine out of 10 influential factors are roughly similar. The difference is that the inflow population ratio in Jiangsu is 2.00 and the incidence rate is 7.66, while the inflow population ratio in Zhejiang is 1.61, and the incidence rate is 20.34, especially in Wenzhou, Zhejiang—the incidence rate in the city is almost half that of Zhejiang province, almost the same as the incidence rate of Jiangsu province. The reason for such a large gap between the two provinces lies in the different proportions of the floating population working in Hubei and Wuhan. For example, Wenzhou City in Zhejiang Province has hundreds of thousands of employees. Therefore, the incidence rates of the two provinces have also been quite different because of the structure of the population inflow. Such a large disparity in the incidence of the two provinces also reminds us that under conditions of inter-provincial openness, with the free flow of various element resources, economic development and risk accumulation coexist.

The spread of the epidemic is the result of the comprehensive impact of multiple factors. In addition to the influential factors related to inter-provincial openness analyzed and verified in this study, the spread of the epidemic is also related to the response speed of the epidemic prevention and control measures implemented in various provinces, the related strategies, efficiency level, citizen awareness, and other factors such as the degree of cooperation in implementing such measures. Therefore, in the process of epidemic prevention and control, all localities must be comprehensively considered. Only when the structural elements that affect the spread of the epidemic are included in the prevention and control system can the path of the epidemic spread be cut off and the epidemic be controlled effectively. The main contribution of this study lies in changing the perspective of research and in constructing, analyzing, and verifying the structural elements of epidemic transmission in the context of inter-provincial openness, to provide ideas and reference for the prevention and control of epidemics for China’s full opening up.

Under the background of normalization of epidemic prevention and control, consistent with the theme of this study, we will continue to analyze the factors related to inter-provincial openness that affect epidemic prevention and control, and bring these factors into the analysis framework of this study, to further improve our work.