Introduction

This work analyzes the evolution of scientific production, measured by the number of papers and reviews published in different countries (their scientific output), using tools from time series analysis. This allows us to compare the rate of scientific development and to identify changes of performance. These changes are associated to dynamical phenomena that affected the world scientific production and were felt in many countries simultaneously. We will attempt to identify or, at least, suggest what events or circumstances were determinants of those structural changes. Simple descriptive analyses of the time series of the production of different countries and different areas of knowledge serve as observers of the dynamics of the areas, as affected by each nation’s emphases and priorities.

This quantitative and qualitative analysis provides also the means to study how changes in the scientific production of one country are related to changes in the production of other nations. This may provide clues about the relative importance, from the point of view of changes in the production, of one country in a group or region.

There have been numerous controversies regarding the number of papers (the scientific output) as representative of the advancement of science in some country or region. We acknowledge the fact that such a number may not be a perfect measure of the quality of science but, nevertheless, it does show the dynamics of the growth in size and importance of some field of science and technology in a country. In fact, our analysis will consider seven fields of knowledge as well as the aggregate of all papers and reviews appearing in the data base.

Our aim is to develop insights, based on the historical development, of the way the scientific community and the science being done in one country relate to their counterparts in other regions. This could eventually lead to the development of tools for the analysis of the way certain social, economic and scientific events affect the rhythm of growth of scientific production. Whether these changes and events lead to increases in research funding or to heightened social and economic status of researchers and attract more highly qualified personnel is a problem that may be difficult to study based solely on historical data. But it could, perhaps, be profitably analyzed by social scientists, using some of the tools presented here.

It is clear that many factors have affected the scientific output of each country, and continue to do so. These factors, both endogenous and exogenous, and the effect that they have on a country’s scientific production have been studied for many specific countries and periods of time. The impact of research and innovation policies, as well as of trends in the way research is conducted are major topics of inquiry. And several papers have considered the manner in which these trends and policies affect the local and global dynamics of scientific production; see, for example Anadón (2012), Kempener et al. (2010), Mabe and Amin (2001) and Rhoten (2004). Some studies, like Foray et al. (2012), have considered the influence that some “Grand Challenges”, generally associated to emblematic mission-oriented science programs, like the Manhattan Project or the Apollo Program, have had on science advancement. A number of papers have analyzed the historical influence of these programs and some authors have proposed to use them as inspiration for new science policies. For example, for research programs aimed at solving global warming-related problems. Sampat (2012) studies the importance of such mission-oriented research as a determinant of the evolution of the National Institute of Health—NIH. Wright (2012), on the other hand, discusses examples of such mission-oriented programs and their impact on agricultural innovations; he studies the importance of three major programs oriented at encouraging research and the effects that the ensuing innovations had on the capability to respond to increasing world food needs.

More recently, Mazzucato (2015) explored the characteristics of mission oriented programs, in particular the ways in which they could serve as inspiration to the creation of public incentive programs that actively intervene in shaping the market and encouraging the development of selected, strategic, technologies.

Numerous other factors influence the development of science, with sometimes profound effects on the rates of production. Among those factors, the growing importance of interdisciplinary research as well as the ever easier and increasing collaboration between researchers from many countries and world regions are aspects that may lead to strengthening the mutual influences between countries. The many facets and caveats of interdisciplinary research have been analyzed by Rhoten (2004) and Rhoten and Parker (2004). The epistemological challenges inherent to any interdisciplinary research team are presented in Miller et al. (2008). On the other hand, the realities of research being done simultaneously in many centers and by diverse teams have been considered by Collins (1998) and later by Cummings and Kiesler (2008) who tried to establish the characteristics of research teams leading to higher probability of success in distributed research collaboration.

Most analyses of scientific output dynamics have concentrated on particular aspects or fields, at some point in time or for relatively short periods of time. Only a few, among which Larsen and Von Ins (2010) can be cited, have considered the long term evolution of scientific production. They limit themselves to computing growth rates of the world’s production. Bornmann and Mutz (2015), extended this analysis to estimate piecewise rates of growth. He also studied the dynamics of publications in some selected fields: natural sciences and medicine.

In comparing the number of papers published by researchers from different countries, collaboration and interdisciplinary research pose an additional challenge since an increased number of papers with authors from several countries represent an additional source of mutual influence between countries and fields of knowledge. As the volume of multi-country and multi-discipline publications increases, it will probably become necessary to include these variables in the dynamic models that describe the interactions. Thus far, we have not seen this necessity.

Our objective, therefore, is to study the volume of scientific production from a macro point of view, in order to identify trends and patterns that characterize the dynamics of production of scientific results rather than to analyze the effects of some countries’ internal policies. To this end, we concentrate on the scientometric aspects of scientific production and we try to relate them to social, political and macroeconomic circumstances.

Reviewing the existing literature, it is apparent that the availability of extensive data bases of published papers in almost any area of science and the metadata associated to such information has led to an unprecedented number of analyses of those publications and their interactions. These scientometric studies have been used extensively for at least three different purposes:

  1. 1.

    To elucidate the structure of science

  2. 2.

    To assess the performance of researchers

  3. 3.

    To study the growth of scientific production

Many authors have utilized bibliometric citation data to analyze the relations between fields and subfields of science. Some of these studies were pioneered by Callon et al. (1986) who underlined the need to consider the networks of interactions between scientists, considered as actors in their networks. The concept of mapping the dynamics of science and technology was introduced to describe the relations between scientists and groups of scientists through citations and co-citations. This, in turn led to the study of how fields of science developed links with other, often almost unrelated, fields through the use of common vocabulary in what was named co-wording. That work opened the door to numerous, more quantitative, analysis. Recently, a thorough description of all science fields has been done, leading to the idea of a global structure of science. This has been done, independently by Leydesdorff and Rafols (2009), based on factor analysis of a journal to journal citation matrix and by Rosvall and Bergstrom (2008) using an information theoretic-based method that approximates the flow of information between fields to identify the modular structure of a network of citations.

The quantitative study of citations and co-citations has led to the measurement of the relative importance of different journals. Several approaches to this problem have been proposed, steadily using more and more detailed information on the number of times that articles in a given journal are cited by papers published in different journals. Among those approaches it is worth mentioning the work of Garfield (1964) that led to the idea of impact factor. Along the same venue, a more recent and elaborated index is the EigenFactor metric West et al. (2010) which weights the cites according to the importance of the journal where the citing document has been published. A similar metric was developed by the SCImago Research Group and others (2007). Numerous metrics have been proposed to measure the impact of publications. An extensive discussion of different indices can be found, for instance, in Vinkler (2010). Some scientists have tried to use the metrics, developed to rank journals, institutions or countries’ scientific production, to the evaluation of individual researchers. Several voices have been raised against this. Braun (2010) is clear about the use of small samples to estimate these metrics: “Further, the most flawed metrics can be those that measure the academic performance of individual scientists (as opposed to the performance of a group, institution, nation or journal). In part, this is simply because statistical reliability decreases as the size of the data set decreases”.

Noyons and van Raan (1996) performed a study of the relative importance of different subfields of neural network research in Germany, comparing it to four other countries in their emphases and priorities. They also evaluated the change from 1 to 2 year period to the next, of one activity index in each subfield. In another dynamic analysis, Zhou et al. (2009) used an approach that is somehow similar to ours in their study of regional differences of Chinese scientific production. However, they were not interested in the dynamic features of the evolution but only in the participation of each region in the country’s output. The recent history of China’s share of the world’s publications was considered by Zhou and Leydesdorff (2006) in their analysis of the importance of the USA to world science. They studied one decade’s data of the production by national authors and the citations to these works. Only a short term analysis was attempted. No dynamical models were considered.

A complex nonlinear dynamical system approach was used by Leydesdorff (2013) to estimate the information content of short term bibliometric data. He advocated the use of entropy statistics and multidimensional scaling to discover hidden structures or relations between fields of knowledge.

Finally, some authors have considered the use of the time series describing the number of publications or the number of cites to estimate the rate of growth of science or, at least, of the world’s scientific output. See, for instance, Larsen and Von Ins (2010), as we mentioned above. We have not found any study comparing the long term history of several countries’ scientific production. Only the values at some instant of time have been analyzed.

Data and methods

Our analysis will be based on the premise that scientific progress and development is a dynamic process where the future state of the system is heavily influenced by the present and past values of that state. Although this may seem pretty obvious to those studying the evolution of research and researchers and the results of their work, it is nevertheless often ignored when modeling the production process. Therefore, we will attempt to develop models that could help to predict the future results of research, given current and past results. This approach, based on dynamical systems will require the use of historic data of the evolution of results to calibrate prospective models.

Data

This work will be based on a time series description of the number of papers and reviews that appear in the Scopus data base (Scopus 2017). There exist records covering from 1823 to 2015. Data from the period after 2015 were not considered to avoid the danger of using data that may be unstable. For some analyses, the series beginning in 1930 were studied. For other computations, especially where population or economic information were used, the series were considered from 1960 to 2015. All Scopus data were downloaded on February 2nd, 2017.

A paper is considered as part of the scientific production of one country if at least one of its authors is associated with an institution from that country. This, of course, means that there may be numerous papers that have been counted more than once for different countries and, often, several times. For instance, there are more than 250,000 papers with authors from the USA and China; there are more than 14,000 papers with authors from the USA, the United Kingdom and China, etc. This implies that the data from different countries may be correlated at a given time. And, in fact, some of the joint variability is undoubtedly due to these co-authorships. However, it makes no sense to assign a paper to only one of its authors when there are several involved.

The twelve countries with the highest number of papers and reviews for 2015 were selected for the study. Those countries are going to be referred to by their internet country code, as follows: Australia (AU), Canada (CA), China (CN), Germany (DE), Spain (ES), France (FR), India (IN), Italy (IT), Japan (JP), South Korea (KR), United Kingdom (UK) and the United States, identified as USA.

In addition, due to personal interest by the authors, six countries from Latin America were studied, although their results are only mentioned briefly. These countries are: Argentina (AR), Brazil (BR), Chile (CL), Colombia (CO), Mexico (MX) and Venezuela (VE).

The time series of the total number of papers that include at least one author from some country in a given year were obtained as well as partial series for the following subjects in the data base: Agricultural and biological sciences (Agri); Biochemistry, genetics and molecular biology (Bioc); Chemistry (Chem); Engineering (Engi); Medicine (Medi); Physics and astronomy (Phys); Social science (Soci). These topics were selected due to the number of publications on the subject at some, arbitrarily chosen, instant of time.

There are also overlaps in the data for fields of knowledge. Many interdisciplinary works will be counted more than once, as it happened with the affiliation countries.

A data base of the evolution of the population and economic information that includes the Gross Domestic Product—GDP—of the countries being considered, was also obtained from the World Bank Data Bank (The World Bank 2017). This information is only available from 1960 to the present.

Methods

Our purpose is to present the application of some standard time series methods to study the dynamics of scientific production. However, space and scope limitations do not allow detailed explanations of the modeling process. All model building methods require that historical data be prepared to suit the analytical framework. In addition, the modeling process should proceed in an iterative manner until statistical significance is achieved. The resulting models should satisfy some stability and statistical independence conditions before a model can be accepted. These aspects will be only briefly discussed but they should be verified before accepting the results. Nevertheless, we will not discuss these steps in our exposition. The interested reader should consult one of the general references on time series; e.g., Brockwell and Davis (2016) or Enders (2004) for the details.

For the rest of our exposition, we represent by \(y_t\) the value of one variable of interest, from the point of view of the scientific production, at time t. The value \(y_t\) might represent, for example, the number of papers published on a given subject, with authors from a given country, during year t.

Some of the more qualitative analyses are based on the original, raw data, without any transformations. However, in order to simplify computations and to allow easier comparisons of related values, a logarithmic transformation was performed on all the data. For the value of a time series \(x_t\) at time t, then a derived variable is obtained as:

$$\begin{aligned} y_t = \log (1+x_t) \end{aligned}$$
(1)

where one unit was added to eliminate the possibility of a zero value leading to an infinite logarithm.

In addition, to eliminate linear trends and to obtain a stationary time series, a differentiation was performed as:

$$\begin{aligned} \varDelta y_t = \log (1+x_t) - \log (1+x_{t-1}), \quad t=2, 3, \ldots \end{aligned}$$
(2)

This gives the percent change from one period to the next one. This operation also serves to eliminate any low frequency trends.

An Augmented Dickey–Fuller test permits to check the stationarity of a series. The stationarity condition is required for the development of many dynamic models; it basically means that the statistical characteristics (mean value and variance) do not vary with time.

Descriptive statistics of time series

Descriptive statistics simply present the most basic information on the series being considered. We will discuss several tools that were employed in the analysis. Only the simplest description will be done. The interested reader may consult any textbook on time series, e.g., Box et al. (2015) for the details and definitions.

Time series line plot The most elementary tool for the analysis of time series shows the plot of one or more time series against time. Figure 1, for instance shows the time variation of scientific production in 12 countries. All variables have been transformed according to Eq. (1). This simple tool will be used extensively in our analyses.

Autocorrelation function Quantifies how strongly correlated the current values of a variable are with future values of that same variable. Figure 4 shows the autocorrelation functions of the number of papers for two countries.

Cross correlation function Quantifies how strongly correlated the values of one series at time t are with the values of another series at some other time \(t+\tau\), as a function of the lag \(\tau\); i.e., the correlation between \(x_t\) and \(x_{t+\tau }\). When the lag (\(\tau\)) is positive the \(y_{t+\tau }\) are future values and, when it is negative, they are past values.

Vector autoregressive (VAR) models

When there are multiple, related, time series it is possible to have mutual influences between several variables. Computation of the cross correlation function of scientific production from several countries shows that there are significant correlations for positive and negative lags. This suggests that the scientific output from different countries could be influenced by the past values of production from some, possibly numerous, countries.

Different types of models have been designed. We will use the so called vector auto regressive (VAR) model where each variable is considered to be able to influence future values of all variables, including itself. The general form of a VAR model for the problem at hand is obtained by defining properly chosen vectors and matrices as:

$$\begin{aligned} X_t= A_0+A_1\,X_{t-1}+A_2\,X_{t-2}+\cdots +A_p\,X_{t-p}+\varepsilon _t \end{aligned}$$
(3)

where \(X_t= {x_{1,t}, x_{2,t},\ldots , x_{k,t}}^{\prime}\) is a (column) vector of variables describing the production of the countries under study, where \(x_{i,t}, i=1,\ldots ,k\) is the production of country i at some time t in a given subject. Matrices \(A_j, j=1,\ldots ,p\) quantify the effects that lagged values of all the variables have on the current values. \(A_0\) is a constant vector and \(\varepsilon _t\) is a random white noise vector. See e.g., Enders (2004) for details on the definitions and the computations necessary for estimating the parameters of the model.

Once a VAR model has been obtained, it is possible to forecast future values of the set of data. It is also possible to study the dynamic coupling problem where a change in one of the variables produces dynamic effects on the others. The tool for solving this problem is the Impulse Response Function (IRF) that gives an estimate of the dynamic response of, for instance, the scientific output of a set of countries, when one of them undergoes a sudden and temporary (an impulse) increase of value.

Results

First, we consider the comparative time evolution of scientific production for several countries, based on the number of papers and reviews published. The overall picture can be seen in Fig. 1. The scale of the dependent variable (vertical axis) is logarithmic, which means that a linear increase represents a constant annual growth rate.The per capita production (more precisely, the number of papers per million people) was also studied; this certainly modified the ordering of the graphs, but the dynamic features were preserved. The time plot of output from twelve countries shows one very interesting feature: It may be safely said that output has grown steadily for all countries, with rare, brief and often reduced, descents.

Fig. 1
figure 1

Scientific output time series. Notice that a linear increase in the logarithm of the number of papers corresponds to an exponential growth in the actual numbers

Overall dynamics

This first part will consider the most elementary analysis, based only on the examination of the time plot of the production of several countries. Despite the simplicity of the tool, it is possible to find some facts that can only be appreciated when the time variations of production are considered concurrently.

The WWII—World War II—dip

There is one important dynamic behavior that immediately catches the eye: The distinct reduction of scientific output for almost all countries during the World War II period. This reduction is more conspicuous for Germany (DE), Japan (JP) and Italy (IT), probably as expected. The production of the United Kingdom (UK), the United States (USA) and Canada (CA) were less affected but they saw significant reductions. A curious case is France (FR) whose output shows an important contraction before all the other countries involved in the conflict. The overall reduction was certainly anticipated under war conditions, characterized by secrecy, material and personnel constraints. It is nonetheless remarkable that the United Kingdom was able to sustain the production of scientific works with only a minor reduction while other countries involved in the conflict experimented much more pronounced contractions. Canada and the USA had similar behaviors to the United Kingdom but they were, of course, farther from the war scene and the results are more easily explained.

Another result that is apparent from the figure is the effect of the war and pre-war conditions on German Science. Before the war, the output of Germany was second only to that of the USA. However, starting in 1931 it began to decline steadily until the end of the war. After that, it was outpaced by the United Kingdom and it never recovered its place.

Evident from this comparison of the evolution of the countries’ scientific output is the fact that the only large scale reduction in worldwide production during the period starting in 1930 was caused by the World War. No other economic or social shocks have been as damaging to the progress of science, at least, in terms of the number of papers published.

The 1973 upsurge

Another “event” that is readily apparent from the time series in Fig. 1 is a sharp increase in the scientific production of almost all countries, that occurred between 1972 and 1974. The line plot shows that the increase affected all countries in the set, except China. And the change was very important for some of them. Spain, for instance increased by 670% its overall production; in France the production grew 248% and in Germany 148%. Even the USA had a growth of 58% during that period. As far as we know, this jump has not been analyzed in the literature on the history and sociology of science.

An analysis by field of knowledge reveals that this sudden change was due mainly to the increase in the number of publications in Medicine. Papers in Biochemistry and Agriculture also contributed to the sudden growth. This may be easily seen in Table 1 and in Fig. 2. A correlation analysis of variations during the period confirms that. Other areas of knowledge did not experiment such hikes. Figure 2 also shows that, during the period 1973–1974 the main source of variation in the number of publications was the medical field. There is a clear association between the oscillation of the number of papers from medicine and the total number. This occurred again in 1984 to 1988, although, that time there was a contribution of engineering.

Table 1 Percent variation of output from 1972 to 1974 for the most significant countries and subjects

We also considered the contribution of various topics to the upsurge in medical papers. The most significant factor here appears to be the number of papers on heart and coronary diseases. These experimented a sudden increase in the USA at exactly the same period that we are studying. Another source of growth are the publications on cancer.

Fig. 2
figure 2

Combined scientific output of the twelve most productive countries by field of knowledge. The \({\mathsf{\_All}}\) suffix refers to the number of publications in all fields (this notation will be used in several graphs and plots, henceforth)

The reasons for this upsurge in the publication activity of almost all countries are not entirely clear to us. It is reasonable to assume that it must have responded to some raise in the availability of funds for financing research. However, an examination of national research budgets for the countries involved does not show a consistent trend during the preceding years. In 1971, after a strong campaign of interest groups and members of the press, the president of the USA signed the National Cancer Act that gave origin to the National Cancer Program and led to the creation of fifteen new cancer research and treatment centers. As a result, the number of publications on cancer rose significantly during the next few years. This, however, does not explain the sudden increase in the number of papers in other countries. And it does not explain the increase in areas other than cancer.

Another plausible explanation to this behavior might be related to a renewed public interest in medical and biological research resulting from the exciting new results from basic genetics and molecular biology following the discovery of the structure of DNA and subsequent research leading to a new and deeper understanding of cell biology. The fact is that the years from 1967 to 1977 witnessed some very exciting events in medical history. It could be named as the golden age of vaccine development with mumps, rubella, chicken pox, pneumonia and meningitis vaccines being developed. The first human heart transplant was also performed during this period. Certainly all this research activity was well worth reporting. However, the reason why the sudden rise in the number of papers occurred precisely in 1973–1974 and why it happened simultaneously in so many countries is not yet clear.

We have also investigated the possibility that the number of journals could have increased significantly in 1973, thus making it easier to publish a paper. However, we have not found convincing evidence in that sense.

The 1995–2002 swing

One last period worth noting in the time evolution of the number of publications is the swing effect that appeared in 1995 and ended in 2002. At the start, in 1995, almost all the countries experienced an increase of the number of papers. However, the next year the trend ceased and led to reductions or slower growth in the total number of publications for all countries in this study. Of all the countries, China recovered more quickly. USA restarted growth in 2001 and by 2002, the slowing down had ended. Evidently, the time frame coincides with the dot-com bubble (1997–2001) that saw stock prices soar very rapidly, especially for companies in the computer and information technology sector. Apparently, there is little doubt that this swing in scientific production was also influenced by the sudden increase in funding for technological companies and research and, later, by the ensuing contraction.

This explanation is reinforced by the observation that the number of papers published grew and then declined in the medicine, biochemistry and engineering fields and in many countries.

The effect of this crisis was more severe in the USA than in other countries. In fact, the engineering field took a blow from which it has never recovered. The number of publications in engineering by USA authors slowed down and stayed down after the crisis, as Fig. 3 shows. In fact, in 2004 Chinese authors surpassed American authors in the number of engineering papers published. And they never looked back: by 2015, Chinese authored papers outnumbered their American counterpart by a factor of 2.5. Even when Chinese language publications are discounted, the advantage is very large. And this is the only instance, among the subject areas being considered, where China outpaces the USA.

Fig. 3
figure 3

Engineering publications during and after the dot-com period. Notice the sharp increase in the production of China. Notice the linear scale

The Japanese slowdown

The time evolution of Japanese authors’ publications tells an interesting story: As the curves show, at the end of WWII, Japanese authored papers grew much faster than any of the Allied countries, except USA and United Kingdom. Only Germany recovered faster than Japan its scientific production. After a period of sustained growth, by 1989 Japan had grabbed the second place in the number of papers published. It surpassed the United Kingdom in that year and it kept a slight but significant difference. However, in 2003, probably due to the economic crisis, it lost the second place to the United Kingdom again and in 2005 it was surpassed by Germany and China. In fact, during the period from 2000, the Japanese scientific output has remained stagnant with virtually no growth. Later, in 2014, India too overthrew it.

In engineering, the story is even more dramatic. Japan had taken second place in the scientific output in 1979 and it kept it up to 2000 when it lost it to United Kingdom and, then in 2001 became fourth losing its place to Germany. At the end of the period India in 2014 and Korea also surpassed the Japanese engineering authors.

Examination of the economic time series reveals that the Gross Domestic Product (GDP) of Japan suffered a severe shock in 1991 and an even more serious one in 2007. Arguably, this long crisis has affected Japanese science and engineering significantly.

The China case

We end this section with a brief description of the time evolution of Chinese publications. In Fig. 1, the singularity of the history of the publications with Chinese authors may be appreciated. After being affected by the WWII dip like so many countries, its scientific production saw an even deeper reduction, starting in 1949. The output remained very low (less than 100 papers per year) up to 1978. At that time, it took a path that has no equivalent in modern science. During the period from 1978 to 2015, the annual growth has been positive all the time, except in 1994 and 2002. During the period, the number of papers published surpassed that of all countries except the USA. The growth has been evident for some time. Zhou et al. (2009) said it clearly: “there is no sign showing China’s growth rate will slow down or stop”. Today, almost 10 years later, no such sign is apparent. And in some areas growth is faster now.

We think that the different periods may be understood at the light of political circumstances. After the war, in 1949 the change of political regime, along with its involvement in the Korean war, was probably crucial for the stagnant situation of Chinese science. Later, in 1978, coinciding with the political upheaval at the time, a change of direction was evident.

The recent dynamical situation has been more evident in the field of engineering, where Chinese publications have exceeded those from all other countries since 2004. In fact, in 2015, its engineering paper output outnumbered all the other eleven countries (in the set of the twelve with the higher production) combined. Some authors have argued that they are not as important or influential due to their quality. But the number is staggering. During the period 2000–2015 the growth of the number of engineering papers was 8 times greater than the USA’s. And many of those papers have been published in the most prestigious journals. Our opinion is that Chinese engineering may become more and more important in the near future.

Auto and cross correlation analysis

One result that we find enlightening about the dynamics of scientific production is illustrated by the autocorrelation functions of the number of papers for different countries. Comparing those auto correlations for all the set of countries, including Latin American countries, it may be observed that this function may be used to characterize how stable the dynamics of scientific production is. All countries with a long history of science production have a high correlation between current output and the production of past years. For countries like the USA there are significant correlations with up to 15 past values. And the values decrease approximately in a linear form with the length of lag. In countries with a shorter tradition of scientific development, the auto correlations are different. They decrease more rapidly, in an approximately exponential way. And the correlation is significant only for lags of a few years.

Our results show that the form of this autocorrelation function is a good predictor of the length of the scientific tradition of each country. This is probably related to the stability of financing and of researchers’ appointments. It is also indicative of the characteristic time scale of changes in the country’s production. For instance, in China, up to 1980, there were significant correlations only up to \(\tau = 3\) years and the length increased to 4 in 1990, 5 in 2000 and 7 in 2015.

Fig. 4
figure 4

Autocorrelation functions of scientific output for a developed (USA) and a developing (BR) country. The gray band is a 95% confidence interval for the coefficients. Values out of the gray band are significantly nonzero

Figure 4 illustrates the difference in auto correlations between the USA and Brazil. Clearly, current production in the USA is more highly correlated with past values (up to 17 years) while that of Brazil is only correlated with the values of the last 8 years). This could be regarded as some kind of inertia that impedes or slows down sudden changes but that also conferes greater robustness to the system.

Cross correlation functions quantify how strongly correlated the production \(x_t\) of one country at time t is with future (\(\tau >0\)) or past (\(\tau <0\)) values of production \(y_{t+\tau }\) in another country. What our results show is that the form of the cross correlation function quantifies the mutual influences between the two variables. Figure 5 displays the cross correlations between the USA and Brazil (a) and between USA and Germany (b). Notice the very different nature of the relationships. Current values of production for the USA are less correlated to past values of Brazil’s production than to future values of Brazil. This is equivalent to saying that the influence of the USA on Brazil is stronger than the influence of Brazil on the USA. The cross correlation of the USA and Germany is more symmetric, although a slight bias toward the USA is perceptible. Of course, high correlation does not necessarily implies causality, although correlation with past values may provide a more credible clue. And, of course, this weakness is common to all econometric models.

Fig. 5
figure 5

Cross correlations of the scientific productions of a Brazil, b Germany, both against the scientific production of the United States

Structure of scientific production

Our last analysis of the scientific production for some of the most productive countries is a comparison of the fraction of papers published in one of seven areas with respect to the total number of papers. This was done for the seven fields of knowledge, described elsewhere. In this case, we compared the fractions at a given time (2015) and tried to find similarities between different countries.

Figure  6 displays the composition of the national science output for several countries in seven subjects. The structure of the production shows that there are two distinct groups according to the contribution of the different areas to the national output. One group shows a clear inclination for the medical sector. The second group presents a more balanced picture.

Fig. 6
figure 6

Composition of 2015 output by subject in a Western, b Eastern, countries

The first group of countries in Fig. 6a that comprises all western countries and Japan has a very similar structure with a contribution from medical literature approaching 0.45 of the total. The second most important field is Biochemistry and molecular biology. The contribution from physics is less homogeneous. The fraction of engineering publications remains close to 12% with a slightly higher participation (15%) in Japan. Social science is almost nonexistent in Japan but in other countries the fraction is similar to the one of engineering.

A second group of countries, whose compositions are shown in Fig. 6b, includes China, India and Korea. This group is characterized by a greater contribution from engineering, especially in China where it reaches near 30%. All three countries have very low participation from Social Science. The contributions from Agriculture, Biochemistry and Chemistry are very similar in all the group. The fraction of Medicine papers is in all three countries significantly lower than in the western countries.

From this Fig. 6, it is clear that the two groups of countries have different contributions to their total production. In all Western countries medicine is the most important subject while engineering papers represent a fraction that is very close to that of social science publications. On the other hand, all Eastern countries produce very little in social science, while engineering makes a more important contribution. In the group of China, India and Korea, engineering publications are more important (in number) than medicine’s.

Upon close examination of Fig. 7a, b, one can see that the most important difference is the contribution of publications in the medical field. In the Eastern countries, this place has been taken by the engineering field. One could seek an explanation in the fact that Korea, China and India have based their development, very strongly, in technology-based industry. It could also be argued that Western-type medicine has to compete seriously with Chinese traditional medicine in that part of the world and, perhaps, it has not attained the status it enjoys in the west. Another plausible explanation could be that academic communities in Eastern countries are more recent and medical research has not grown as rapidly as technological research. The similarity of the structures for the Western Countries is probably related to a long and strong coupling, due to the marked influences of a few leading countries during the last two centuries. More detailed studies might be able to clarify this point.

Fig. 7
figure 7

Evolution of the fraction of publications by areas. The radial scale shows the contribution of the area to the aggregated total scientific output of twelve countries during each period

Since the group called here Eastern is growing more rapidly than the rest, a question arises as to whether the overall composition of the world output has changed recently. To answer this question, the composition by fields was analyzed for three different periods of time. We considered the composition of the combined output of the twelve countries that we have studied so far, using a 15 year period. We used \(t= 1985, 2000, 2015\). This was repeated for the seven more productive fields of knowledge. The results appear in Fig. 7. The scale quantifies the fraction of total production of the period represented by each individual field. As suspected, we can see that the composition of the global output has changed in time. In fact, the contribution of medicine to the production of each period has been decreasing in time, from 43.7% in 1971–1986 to 30.4% in 2001–2015. The contribution of engineering publications, on the other hand, has grown from 6.9 to 13.8% during the same period. At the same time, the contribution of material science has nearly doubled, passing from 4.9 to 9.9%. At the light of the rate of growth of production in China, India and Korea and the composition of production in those countries, it is likely that the contribution of engineering and material science will increase steadily during the next few years. It is also probable that medicine will contribute less in the future. At least, that seems likely for the short term foreseeable future, a feeling that is reinforced when the rates of growth of two of the main fields, medicine and engineering, are compared, as in Fig. 8. It becomes clear from that figure that the countries with the faster growing production are more involved in engineering than in medicine.

Fig. 8
figure 8

Scientific production for six countries in a medicine, b engineering

Effects of political upheavals

The analysis of time series permits an estimation of the effects of some political disruptions on scientific production. Our analysis shows, for example, that authoritarian regimes have been detrimental to the development of science and technology, as evidenced by the effects that political upheavals caused in China. The case of Nazi Germany reinforces this idea: Soon after the National Socialist Party rose to power in 1933, the scientific output of Germany suffered a significant setback.

In Latin America two cases are illustrative, as shown in Fig. 9. In the first place we can see that the number of publications by Chilean authors slowed down severely as a consequence of the 1973 military coup. Figure 9 a displays the logarithm of the number of papers published. As mentioned before, the slope of the curve represents an annual rate of growth. The production histories before and after the coup are clearly different. The Chilean science later recovered, after the dictatorship was over.

Fig. 9
figure 9

Latin American examples of the effect of authoritarian regimes. a Effect of 1973 coup d’état in Chile (log-scale). b Comparison of Venezuelan and Colombian outputs before and after 1999 Chavez’s ascension to power (linear-scale)

An even more eloquent illustration is the case of Venezuela during the Chavez regime that started in 1999. A comparison of Venezuela with neighboring country Colombia, as presented in Fig. 9b, shows that the two countries, sharing long common histories, took entirely divergent paths. Colombian growth resembles the development of most Latin American countries for the period. In comparison, Venezuela’s output practically halted after 1999 (Fig. 9).

Vector auto-regressive model of influences

Last, we analyzed the dynamics of interactions between the scientific outputs of different countries. A Vector Auto Regressive (VAR) model was fitted to the set of time series characterizing the scientific production of the six most productive nations: USA, China, United Kingdom, Germany, India and Japan. In order to do that, each series was logarithmically transformed according to Eq. (1) and then differentiated, as in Eq. (2). Each resulting series was tested for the existence of unit roots using an Extended Dickey–Fuller test to guarantee stationarity. All six passed the test.Footnote 1

A VAR model was fitted that uses up to 3 lags in all the variables to predict the relationship between the time variation of the number of papers for all countries. This model will be used for illustration purposes: Let the production of country i, at time t be \(x_{i,t}\). The equation for the production of country i is a difference equation of the form

$$\begin{aligned} \begin{aligned} x_{i,t} =&\,a_{i1}^{(1)}\, x_{1,t-1} + \cdots a_{i6}^{(1)}\, x_{6,t-1}+ a_{i1}^{(2)}\, x_{1,t-2} + \cdots a_{i6}^{(2)}\, x_{6,t-2}+\cdots \\&+\, a_{i1}^{(3)}\, x_{1,t-3} + \cdots a_{i6}^{(3)}\, x_{6,t-3}+ \varepsilon _{i,t}, \end{aligned} \end{aligned}$$
(4)

where \(\varepsilon _{i,t}\) is a white noise sequence. The coefficients are obtained by fitting the model to the data. Similar equations exist for the production of all six countries. Only the general equation level statistics of the estimation will be reported here (Table 2). The individual coefficient value estimations are not included here but can be requested from the authors.

Table 2 Vector autoregression estimation results

As table 2 indicates, the fitted model explains a great deal (over 99%) of the variation around the mean values, as given by the corresponding \(R^2\) values. And all the equations fitted for the countries are significant, as the p-values attest (they are all less than \(10^{-4}\)). A \(\chi ^2\) test was performed to see if the resulting residuals form a white noise sequence. The sum of squares RMSE is compared to a \(\chi ^2\) distribution with the given degrees of freedom.

The existence of a well fitted model for the production of all six countries means that the annual variations of these countries are not completely independent but they may be predicted based on the recent history of production for all the countries in the set. This implies the existence of a set of mutual influences that may explain the dynamics along time. The numerical values of the coefficients in the model quantify the strength of those influences.

After the model was fitted and its adequacy was testedFootnote 2 an Impulse Response Function (IRF) was computed to evaluate the relative effects that a sudden change in the production of one of the countries has on the outputs of the others. The IRF function describes the effect of a one unit (one standard deviation) momentaneous (a single time period) change in the value of one variable, on all the variables included in the model. An impulse-type perturbation:

$$\begin{aligned} \delta _t = {\left\{ \begin{array}{ll} 1, &{} t=0\\ 0, &{} t>0 \end{array}\right. }, \end{aligned}$$

is used, meaning that one of the variables in the model deviates, momentaneously, from equilibrium and the resulting dynamics is then simulated to predict its effect on all the variables. The IRF is obtained by simulating the response of the difference equation with all variables, except one, initially set at zero. In the IRF plots the continuous line is the expected value of the disturbance and the gray band around it is a 95% confidence interval. A good introduction to the use of Impulse Response Functions may be found in Lütkepohl (2005), who provides explanations and a meaningful discussion on the advantages and drawbacks of this tool.

As the IRF graph (Fig. 10) shows, despite the increasing participation of China in world scientific production, still the USA and the United Kingdom have a greater impact on the other countries, in terms of the variation of the number of papers published, than Germany or China. Of these four countries, China still does not have such a great impact on the other three, but it is affected by changes in all the others. The influence of the United Kingdom on the USA seems a little stronger than the reverse influence. Since the models have considered only the number of papers, the growth of China, almost independently of the others does not provide a strong joint variation. A more detailed model that takes into account co-citations may show a different picture. However, this is not what the conventional wisdom tells at this moment.

Notice that the initial response of China to an increased production in the United Kingdom is negative. This is the result of a negative coefficient for the UK in the model for China’s production. A similar, but less pronounced, effect appears in Germany (DE) when the impulse is at the USA. This effect appears often when there exists some competition for resources such that the increased production in only one country has to be done at expenses of some other. Thus, one possible explanation for this behavior is the existence of some balance; i.e., it is possible that to be able to increase the United Kingdom’s production, some resources may need to be reduced in China, maybe in the form of reduced cooperation, for example. Notice also that the last row of plots shows that an increase in the production of USA results in small reductions in Germany and United Kingdom, but in an increased production in China. Perhaps, the cooperation of Chinese authors is stronger with USA than with the United Kingdom.

It is worth noting that the responses to impulses in all countries, except the UK, decay rapidly and all are over after about 6 years. For an impulse at the UK, it takes about 8 years to stabilize. This means that the effects of sudden increases in production do not have lasting effects. Sustained increases, however, may continue to influence the other countries’ production. The plots shown in Fig. 10 also indicate, based on the magnitude of the responses to a disturbance, that the effect of an impulse in UK has an effect on China that is almost twice the size of the effect of an impulse in USA. Another interesting result is that the variation in production is more abrupt in the USA than all other countries; the amplitude of the variation is almost 50% higher than in England. The amplitude of the variation in Germany is about the same size as in USA, but it takes 2 years to be attained, while in the USA this variation is achieved in a single year.

Fig. 10
figure 10

Impulse Response Function for the four most productive countries: China (N, Germany (DE), United Kingdom (UK) and United States (USA). The annotations indicate the names of the variables used in our study as follows: \({\mathsf{Country6}}\) is the identification of the IRF; the prefix \({\mathsf{D\_l\_}}\) before the country’s name shows that a first difference of the logarithm of the production; the suffix \({\mathsf{\_All}}\) refers to the publications in all fields of knowledge

Discussion

When analyzing the scientific production, it is indispensable to understand that the generation of knowledge and its scientific output are the result of a dynamical process whose current and future states depend on their values in the past.

Simple descriptive statistics of countries’ publication time series allow us to contrast the effects of social and political conditions on scientific production. This could, in turn, permit to assess the impact of policies and regimes on scientific and technological progress. For instance, we have demonstrated how the examination of the time series alone can provide valuable insights about the process of discovery and creation of knowledge.

Examination of the time series indicates that wars have caused significant impacts on the scientific output of almost all countries. Even some that were not directly involved have been affected by the conflicts. This is clearly seen both for WWII and the Korean conflict. A similar analysis has shown that major political disruptions in some countries have produced substantial, measurable, setbacks in their scientific outputs. The effects of authoritarian regimes in Chile and Venezuela have been studied.

Evidently, the scientific production of one country cannot be independent of other countries’ output. But the VAR model we obtained shows the extent of that dependency with respect to the annual rate of change of the production. The corresponding impulse response functions show that the rate of growth of China and Germany is apparently more highly correlated to changes in USA or United Kingdom than the other way around. Figure 10 also shows that in all cases the disturbances produce effects that die out in 8 years or less. It is also remarkable that the uncertainty (gray band) is smaller for changes in China than in any other country.

The VAR model also indicates that short term forecasts of one country’s production might be feasible based on its own output history and on the set of mutual influences with other countries.

In conclusion, time series analysis tools may be profitably utilized to study scientific development, based on bibliometric and socioeconomic data. We have presented here only a few possibilities.

The time series-based approach to scientometrics is only possible if reliable time series are available for the variables in which we are interested. This might become less and less of a limitation as new data become available each day. Notice that a wide variety of models might be necessary since a single type of model may not be appropriate in all situations.

As future perspectives, it would be worthwhile constructing rigorous time series concerning the onset and abandonment of different policies and the extent to which they are adopted, both at the national and international levels. Then we could quantify, by means of dynamic models of the form we presented, the predicted effects and contrast our estimation with the predictions made by past qualitative analyses. It would be very interesting to establish some predictions to be tested, ex post, for some past experiences where predictions have been made based on qualitative studies and whose results are already known. If this were possible, it might become feasible to compare the predictions of the time series models against the predictions already tested.

Another potential application of the models being proposed here could be to gather time series data for collaboration and co-citation between researchers from different countries in an attempt to quantify how much of the mutual influences are due to collaboration as opposed to simple emulation or competition. It could also be feasible to analyze aggregated data for regions like the European Union and contrast them to the production of isolated countries.

A final direction into which this work could be extended is the use of more recent developments in the area of time series analysis, including wavelets and nonlinear dynamical models.