Abstract
This paper suggests an empirical framework to classify research collaboration activities with developed indicators that carry on a previous theoretical framework (Wagner [Science and Technology Policy for Development, Dialogues at the Interface, 2006]; Wagner et al. [Linking effectively: Learning lessons from successful collaboration in science and technology. DB-345-OSTP, 2002]) by employing the Gaussian mixture model, an advanced probabilistic clustering analysis. By further exploring the method upon a profound evidence-based reflection of actual phenomena, this paper also proposes an exploratory analysis to manage and evaluate research projects upon their differentiated classification in a preceding perspective of research collaboration and R&D management. In addition, the results show that international collaboration tends to be associated with more evenly committed collaboration, and that collaboration featuring a higher degree of funding or dispersed commitments generally results in larger outcomes than research clustered on the opposite side of the framework.
Similar content being viewed by others
Notes
Wagner (2006) lists the major motivations that organize international collaboration: increasing visibility among peers and exploiting complementary capabilities, sharing the costs of projects that are large in scale or scope, obtaining access or sharing expensive physical resources, achieving greater leverage by sharing their data, and exchanging ideas in order to encourage greater creativity.
European organization for nuclear research.
International project to design and build an experimental fusion reactor.
International space station project.
Human frontiers science program.
Human genome project.
Intergovernmental panel on climate.
Arctic research.
Ocean drilling program.
Wagner (2006) stresses taxonomy of international collaboration in using this frame.
Akaike information criterion (AIC), Bayesian information criterion (BIC) and minimum description length criterion (MDL) are commonly used as criteria.
In Korea, these megascience programs are performed primarily by other institutes: Korea aerospace research institute, National fusion research institute, or Pohang accelerator laboratory.
Rather than such an academic topic, in order to meet ordinary national agendas, the research topics of the department to which the co-authors belong—namely, the department of system dynamics—mainly focus on designs that reduce the noise otherwise created by industrial equipment.
Impact factor is gathered from SCI papers only. It may be considered biased, but no critical difference in ranking compared to SCI publications between the upper-half and lower-half groups can be found. This fact rather enhances the stratification between clusters.
Observation × dimension.
References
Abramo, G., D’Angelo, C. A., & Solazzi, M. (2011). The relationship between scientists’ research performance and the degree of internationalization of their research. Scientometrics, 86(3), 629–643.
Acedo, F. J., Barroso, C., Casanueva, C., & Galán, J. L. (2006). Co-authorship in management and organizational studies: An empirical and network analysis. Journal of Management Studies, 43(5), 957–983.
Bouman, C. A., Shapiro, M., Cook, G. W., Atkins, C. B., Cheng, H., Jennifer, G., et al. (2005). Cluster: An unsupervised algorithm for modeling gaussian mixtures. West Lafayette: School of Electrical Engineering Purdue University.
Breschi, S., & Malerba, F. (2011). Assessing the scientific and technological outcome of EU framework programmes: Evidence from the FP6 projects in the ICT field. Scientometrics, 88(1), 239–257.
Cheng, J., Yang, J., & Zhou, Y. (2005). A novel adaptive Gaussian mixture model for background subtraction. Lecture Notes in Computer Science, 3522, 587–593.
Cohen, W. M., Nelson, R. R., & Walsh, J. P. (2002). Links and impacts: The influence of public research on industrial R&D. Management Science, 48(1), 1–23.
Crane, D. (1972). Invisible colleges. Chicago: University of Chicago Press.
Crowston, K. (1994). A taxonomy of organisational dependencies and coordination mechanisms. MIT Center for Coordination Science Working Paper. Massachusetts Institute of Technology. Retrieved from http://ccs.mit.edu/ccsmainhtml). Accessed 1 June 2011.
Dempster, A., Laird, N., & Rubin, D. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of Royal Statistical Society Series B, 39(1), 1–38.
Edge, D. (1979). Quantitative measures of communication in science: A critical review. History of Science, 17, 102–134.
Frame, J. D. (1987). Managing projects in organizations. How to make best use of time, techniques, and people. San Francisco: Jossey-Bass.
Goffman, W., & Warren, K. S. (1980). Scientific information systems and the principle of selectivity. New York: Praeger.
Hagstrom, W. O. (1965). The scientific community. New York: Basic Books.
Han, D. S., Jang, D. H., Han, S. H., & Yang, J. M. (2008). An empirical study on the impacts of public funding on the research performance of academic faculties. Korean Public Administration Review, 42(4), 265–290.
Hansen, M. H., & Yu, B. (2001). Model selection and the principle of minimum description length. Journal of American Statistical Association, 96(454), 746–774.
Hinings, C. R., & Greenwood, R. (1996). Working together. In P. J. Frost & S. M. Taylor (Eds.), Rhythms of academic life: Personal accounts of careers in Academia (pp. 225–237). Thousand Oaks: Sage.
Hoegl, M., & Gemuenden, H. G. (2001). Teamwork quality and the success of innovative projects: A theoretical concept and empirical evidence. Organization Science, 12(4), 435–449.
Kashyap, R. L. (1980). Inconsistency of the AIC rule for estimating the order of autoregressive models. IEEE Transactions on Automatic Control, 25(5), 996–998.
Katz, J. S., & Martin, B. R. (1997). What is research collaboration? Research Policy, 26, 1–18.
Laband, D. N., & Tollison, R. D. (2000). Intellectual collaboration. Journal of Political Economy, 108, 632–662.
Laudel, G. (2002). What do we measure by co-authorships? Research Evaluation, 11, 3–15.
Liao, C. H. (2011). How to improve research quality? Examining the impacts of collaboration intensity and member diversity in collaboration networks. Scientometrics, 86(3), 747–761.
Lundberg, J., Tomson, G., Lundkvist, I., Skar, J., & Brommels, M. (2006). Collaboration uncovered: Exploring the adequacy of measuring university-industry collaboration through co-authorship and funding. Scientometrics, 69, 575–589.
Martin, B., & Salter, A. (1996). The relationship between publicly funded basic research and economic performance. Report of the science and policy research Unit. East Sussex: University of Sussex.
Melin, G., & Persson, O. (1996). Studying research collaboration using co-authorships. Scientometrics, 36, 363–377.
Narin, F., & Whitlow, E. S. (1990). Measurement of scientific cooperation and coauthorship in CEC-related areas of science (report EUR 12900). Luxembourg: Office for Official Publications of the European Communities.
Newman, M. E. J. (2001). Scientific collaboration networks. Physical Review E, 64. doi:10.1103/PhysRevE.64.016131.
Payne, J. (1995). Management of multiple simultaneous projects: A state-of-the-art review. International Journal of Project Management, 13(3), 163–168.
Pfeffer, J., & Salancik, G. R. (1978). The design and management of externally controlled organization. New York: Harper and Row.
Piette, M. J., & Ross, K. L. (1992). An analysis of the determinants of co-authorship in economics. The Journal of Economic Education, 23, 277–283.
Rissanen, J. (1983). A universal prior for integers and estimation by minimum description length. Annals of Statistics, 11(2), 417–431.
Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6(2), 461–464.
Smith, D., & Katz, J. S. (2000). Collaborative approaches to research, HEFCE fund review of research policy and funding. East Sussex: University of Sussex.
Solla Price, D., & Beaver, D. (1966). Collaboration in an invisible college. American Psychologist, 21, 1011–1018.
Thomson, J. D. (1967). Organizations in Action. NY: McGraw-Hill.
Traore, N., & Landry, R. (1997). On the determinants of scientists’ collaboration. Science Communication, 19, 124–140.
Um, I. (2011). The determinants of R&D budget. Doctoral dissertation, Kookmin University,Seoul.
Vafeas, N. (2010). Determinants of single authorship. EuroMed Journal of Business, 5, 332–344.
van Raan, A. F. J. (1998). The influence of international collaboration on the impact of research result. Scientometrics, 42, 423–428.
Wagner, C. S. (2005). Six case studies of international collaboration in science. Scintrometrics, 62, 3–26.
Wagner, C. S. (2006). International collaboration in science and technology: Promises and pitfalls. In B. Louk & E. Rutger (Eds.), Science and technology policy for development, dialogues at the interface (pp. 165–176). London: Anthem Press.
Wagner, C. S., & Leydesdorff, L. (2005). Network structure, self-organization, and the growth of international collaboration in science. Research Policy, 34, 1608–1618.
Wagner, C. S., Staheli, L., Silberglitt, R., Wong, A., & Kadtke, J. (2002). Linking effectively: Learning lessons from successful collaboration in science and technology. DB-345-OSTP. Santa Monica: RAND.
Wagner, C., Yezril, A., & Hassell, S. (2000). International cooperation in research and development: An update to an inventory of US government spending, MR-1248. Santa Monica: RAND.
Woolley, A. W., Chabris, C. F., Pentland, A., Hashmi, N., & Malone, T. W. (2010). Evidence for a collective intelligence factor in the performance of human groups. Science, 330(6004), 686–688. doi:10.1126/science.1193147.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
Finding the maximum likelihood mixture density parameters via the EM algorithm can be performed as follows. AIC and MDL differ in their penalty terms. The MDL criterion reflects the total number of NM Footnote 16 data values, unlike the AIC criterion, in which data tends to result in an over-fitting of the model. Thus, when the number of observations heads toward infinity, the estimated model order C does not converge to the true value (Kashyap 1980). The MDL criterion, on the contrary, attempts to find the model order C by minimizing the MDL value, the code of which reflects both the number of data samples and the parameter vector. Consequently, unlike AIC, the MDL criterion does not have the limitations of over-fitting the model by generally guaranteeing a consistent estimator (Bouman et al. 2005). This study adopts the MDL criterion proposed by Rissanen (1983) for order identification.
Direct minimization of the MDL criterion is difficult, but if only the clusters of each observation, \( x_{n} , \) are known, then the estimation of parameters \( \theta = \left( {\pi ,\mu ,W} \right) \) will be quite simple (Bouman et al. 2005).
The EM algorithm is a method that is generally used to find the MLE of the parameters, especially when the data is incomplete or has missing values (Dempster et al. 1977). Although the EM algorithm can be applied in cases where the data actually has missing values, it is generally applied in cases where optimizing the likelihood function is analytically intractable or the likelihood function can be simplified by assuming the existence of values for additional but missing or hidden parameters.
The EM algorithm first finds the expected value of the complete-data log-likelihood \( Q(\theta ;\theta^{(i)} ) \) with respect to the unknown data, \( x_{n} , \) given \( Y_{n} \) and \( \theta^{(i)} , \) which are the observations and current parameter estimates of the clusters, respectively. Membership is represented by a probability function. This study adopts the merging approach taken by Rissanen (1983), which constrains the parameters of two clusters to be equal to a decrease in the number of clusters from C to C − 1. In other words, if two clusters, a and b, are merged into a single cluster, their mean and covariance parameters are constrained to be equal, as denoted in Eqs. A.1 and A.2.
where \( \mu_{(a,b)} \) and \( W_{(a,b)} \) denote the mean and covariance of the new cluster and \( d(a,b) \) denotes the distant function, In addition, \( \theta^{*} \) and \( \theta^{*}_{(a,b)} \) are the unconstrained and constrained optima. In particular, if the EM algorithm has been conducted to converge for a fixed order \( C,\,\theta^{*} \) equals \( \theta^{(i)} , \) which satisfies \( Q(\theta^{(i)} ;\theta^{(i)} ) - Q(\theta^{*} ;\theta^{(i)} ) = 0 \). The value of \( \theta^{*}_{(a,b)} \) is obtained by maximizing \( Q(\theta_{(a,b)} ;\theta^{(i)} ) \) as a function of \( \theta_{(a,b)} , \) subject to the constraints.
Having found the cluster pair \( (a^{*} ,b^{*} ) \) that minimizes the distant function, and an upper bound on the change in the MDL criterion among all pairs, \( (a^{*} ,b^{*} ) \) are merged. The parameters of this merged cluster are then calculated and used as an initial value for another EM optimization process with C − 1 clusters (Bouman et al. 2005).
Rights and permissions
About this article
Cite this article
Jeong, S., Choi, J.Y. The taxonomy of research collaboration in science and technology: evidence from mechanical research through probabilistic clustering analysis. Scientometrics 91, 719–735 (2012). https://doi.org/10.1007/s11192-012-0686-9
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-012-0686-9