Abstract
The mechanism why two strange scholars become collaborators has been extensively studied from the perspective of social network analysis. In academia, two scholars may collaborate with each other more than once, which means that scientific collaboration is to some extent sustainable. However, less research has been done to explore the sustainability of scientific collaboration. In this paper, we examine to what extent the collaboration sustainability can be predicted. For this purpose, an extreme gradient boosting-based collaboration sustainability prediction model named CSTeller is devised. We propose to analyze the sustainability of scientific collaboration from the perspectives of collaboration duration and collaboration times. We investigate factors that may affect collaboration sustainability based on scholars’ local properties and network properties. These factors are adopted as input features of CSTeller. Extensive experiments on two real scholarly datasets demonstrate the effectiveness of our proposed model. To the best of our knowledge, this is the first attempt to explore scientific collaboration mechanism from the perspective of sustainability. Our work may shed light on scientific collaboration analysis and benefit many practical issues such as collaborator recommendation since a scientific collaboration is not a one-shot deal.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11280-019-00703-y/MediaObjects/11280_2019_703_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11280-019-00703-y/MediaObjects/11280_2019_703_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11280-019-00703-y/MediaObjects/11280_2019_703_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11280-019-00703-y/MediaObjects/11280_2019_703_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11280-019-00703-y/MediaObjects/11280_2019_703_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11280-019-00703-y/MediaObjects/11280_2019_703_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11280-019-00703-y/MediaObjects/11280_2019_703_Fig7_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11280-019-00703-y/MediaObjects/11280_2019_703_Fig8_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11280-019-00703-y/MediaObjects/11280_2019_703_Fig9_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11280-019-00703-y/MediaObjects/11280_2019_703_Fig10_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11280-019-00703-y/MediaObjects/11280_2019_703_Fig11_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11280-019-00703-y/MediaObjects/11280_2019_703_Fig12_HTML.png)
Similar content being viewed by others
References
Adam-Bourdarios, C., Cowan, G., Germain-Renaud, C., Guyon, I., Kégl, B., Rousseau, D.: The higgs machine learning challenge. In: Journal of Physics: Conference Series, vol. 664, p 072015. IOP Publishing (2015)
Babajide Mustapha, I., Saeed, F.: Bioactive molecule prediction using extreme gradient boosting. Molecules 21(8), 983 (2016)
Benchettara, N., Kanawati, R., Rouveirol, C.: Supervised machine learning applied to link prediction in bipartite social networks. In: 2010 International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp 326–330. IEEE (2010)
Birnholtz, J.P.: When do researchers collaborate? Toward a model of collaboration propensity. J. Am. Soc. Inf. Sci. Technol. 58(14), 2226–2239 (2007)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Bu, Y., Ding, Y., Liang, X., Murray, D.S.: Understanding persistent scientific collaboration. Journal of the Association for Information Science and Technology p. https://doi.org/10.1002/asi.23966 (2017)
Caragea, C., Wu, J., Williams, K., Gollapalli, S.D., Khabsa, M., Teregowda, P., Giles, C.L.: Automatic identification of research articles from crawled documents. In: WSDM 2014 Workshop on Web-scale Classification: Classifying Big Data from the Web (2014)
Chakraborty, T., Patranabis, S., Goyal, P., Mukherjee, A.: On the formation of circles in co-authorship networks. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 109–118. ACM (2015)
Chen, T., Guestrin, C.: Xgboost: A scalable tree boosting system. In: Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 785–794. ACM, New York (2016), https://doi.org/10.1145/2939672.2939785
Choudhury, N., Uddin, S.: Time-aware link prediction to explore network effects on temporal knowledge evolution. Scientometrics 108(2), 745–776 (2016)
Cronin, B., Shaw, D., La Barre, K.: A cast of thousands: Coauthorship and subauthorship collaboration in the 20th century as manifested in the scholarly journal literature of psychology and philosophy. J. Am. Soc. Inf. Sci. Technol. 54(9), 855–871 (2003)
Dong, Y., Johnson, R.A., Yang, Y., Chawla, N.V.: Collaboration signatures reveal scientific impact. In: 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp 480–487. IEEE (2015)
Dong, Y., Johnson, R.A., Chawla, N.V.: Can scientific impact be predicted? IEEE Trans. Big Data 2(1), 18–30 (2016)
Eom, Y.H., Jo, H.H.: Generalized friendship paradox in complex networks: The case of scientific collaboration. Sci. Rep. 4, 4603 (2014)
Granovetter, M.S.: The strength of weak ties. Am. J. Sociol., 1360–1380 (1973)
Hara, N., Solomon, P., Kim, S.L., Sonnenwald, D.H.: An emerging view of scientific collaboration: Scientists’ perspectives on collaboration and factors that impact collaboration. J. Am. Soc. Inf. Sci. Technol. 54(10), 952–965 (2003)
He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)
Hou, H., Kretschmer, H., Liu, Z.: The structure of scientific collaboration networks in scientometrics. Scientometrics 75(2), 189–202 (2007)
Huang, J., Zhuang, Z., Li, J., Giles, C.L.: Collaboration over time: Characterizing and modeling network evolution. In: Proceedings of the 2008 International Conference on Web Search and Data Mining, pp 107–116. ACM (2008)
Jiang, T., Liu, T., Ge, T., Sha, L., Li, S., Chang, B., Sui, Z.: Encoding temporal information for time-aware link prediction. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 2350–2354 (2016)
Katz, J.S., Martin, B.R.: What is research collaboration? Res. Policy 26(1), 1–18 (1997)
Khabsa, M., Giles, C.L.: The number of scholarly documents on the public Web. PLoS ONE 9(5), e93, 949 (2014)
Khan, S., Liu, X., Shakil, K.A., Alam, M.: A survey on scholarly data: From big data perspective. Inf. Process. Manag. 53(4), 923–944 (2017)
Kong, X., Jiang, H., Yang, Z., Xu, Z., Xia, F., Tolba, A.: Exploiting publication contents and collaboration networks for collaborator recommendation. PLoS ONE 11(2), e0148, 492 (2016)
Kong, X., Mao, M., Wang, W., Liu, J., Xu, B.: Voprec: Vector representation learning of papers with text information and structural identity for recommendation. IEEE Transactions on Emerging Topics in Computing. https://doi.org/10.1109/TETC.2018.2830698 (2018)
Kossinets, G., Watts, D.J.: Empirical analysis of an evolving social network. Science 311(5757), 88–90 (2006)
Kramer, O.: K-nearest neighbors. In: Dimensionality Reduction with Unsupervised Nearest Neighbors, pp 13–23. Springer (2013)
Li, J., Xia, F., Wang, W., Chen, Z., Asabere, N.Y., Jiang, H.: Acrec: A co-authorship based random walk model for academic collaboration recommendation. In: Proceedings of the 23rd International Conference on World Wide Web, pp 1209–1214. ACM (2014)
Li, L., Tong, H.: The child is father of the man: Foresee the success at the early stage. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 655–664. ACM (2015)
Liben-Nowell, D., Kleinberg, J.: The link-prediction problem for social networks. J. Assoc. Inf. Sci. Technol. 58(7), 1019–1031 (2007)
Liu, H., Zhang, X., Zhang, X., Cui, Y.: Self-adapted mixture distance measure for clustering uncertain data. Knowl.-Based Syst. 126, 33–47 (2017)
Lopes, G.R., Moro, M.M., Wives, L.K., De Oliveira, J.P.M.: Collaboration recommendation on academic social networks. In: International Conference on Conceptual Modeling, pp 190–199. Springer (2010)
Lü, L., Zhou, T.: Link prediction in complex networks: A survey. Physica A: Statist. Mech. Appl. 390(6), 1150–1170 (2011)
Newman , M.E.: Scientific collaboration networks. ii. Shortest paths, weighted networks, and centrality. Phys. Rev. E 64(1), 016, 132 (2001)
Newman, M.E.: The structure of scientific collaboration networks. Proc. Natl. Acad. Sci. 98(2), 404–409 (2001)
Newman, M.E.: Coauthorship networks and patterns of scientific collaboration. Proc. Nat. Acad. Sci. 101(suppl 1), 5200–5205 (2004)
Persson, O., Glänzel, W., Danell, R.: Inflationary bibliometric values: The role of scientific collaboration and the need for relative indicators in evaluative studies. Scientometrics 60(3), 421–432 (2004)
Petersen, A.M.: Quantifying the impact of weak, strong, and super ties in scientific careers. Proc. Natl. Acad. Sci. 112(34), E4671–E4680 (2015)
Rokach, L., Maimon, O.: Data Mining with Decision Trees: Theory and Applications. World Scientific (2014)
Seber, G.A., Lee, A.J.: Linear Regression Analysis, vol. 936. Wiley (2012)
Sinatra, R., Wang, D., Deville, P., Song, C., Barabási, A.L.: Quantifying the evolution of individual scientific impact. Science 354(6312), aaf5239 (2016)
Sonnenwald, D.H.: Scientific collaboration. Ann. Rev. Inf. Sci. Technol. 41(1), 643–681 (2007)
Stokols, D., Hall, K.L., Taylor, B.K., Moser, R.P.: The science of team science: Overview of the field and introduction to the supplement. Am. J. Prev. Med. 35(2), S77–S89 (2008)
Sun, Y., Han, J., Aggarwal, C.C., Chawla, N.V.: When will it happen?: Relationship prediction in heterogeneous information networks. In: Proceedings of the Fifth ACM International Conference on Web Search and Data Mining, pp 663–672. ACM (2012)
Tang, J., Wu, S., Sun, J., Su, H.: Cross-domain collaboration recommendation. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 1285–1293. ACM (2012)
Tang, J., Chang, S., Aggarwal, C., Liu, H.: Negative link prediction in social media. In: Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, pp 87–96. ACM (2015)
Tsai, C.H., Lin, Y.R.: Tracing and predicting collaboration for junior scholars. In: Proceedings of the 25th International Conference Companion on World Wide Web, pp 375–380. International World Wide Web Conferences Steering Committee (2016)
Tylenda, T., Angelova, R., Bedathur, S.: Towards time-aware link prediction in evolving social networks. In: Proceedings of the 3rd Workshop on Social Network Mining and Analysis, p 9. ACM (2009)
Wang, W., Bai, X., Xia, F., Bekele, T.M., Su, X., Tolba, A.: From triadic closure to conference closure: The role of academic conferences in promoting scientific collaborations. Scientometrics 113(1), 177–193 (2017)
Wang, W., Cui, Z., Gao, T., Yu, S., Kong, X., Xia, F.: Is scientific collaboration sustainability predictable?. In: Proceedings of the 26th International Conference on World Wide Web Companion, pp 853–854. International World Wide Web Conferences Steering Committee (2017)
Williams, K., Wu, J., Choudhury, S.R., Khabsa, M., Giles, C.L.: Scholarly big data information extraction and integration in the citeseer χ digital library. In: 2014 IEEE 30th International Conference on Data Engineering Workshops (ICDEW), pp 68–73. IEEE (2014)
Wuchty, S., Jones, B.F., Uzzi, B.: The increasing dominance of teams in production of knowledge. Science 316(5827), 1036–1039 (2007)
Xia, F., Chen, Z., Wang, W., Li, J., Yang, L.T.: Mvcwalker: Random walk-based most valuable collaborators recommendation exploiting academic factors. IEEE Trans. Emerg. Topics Comput. 2(3), 364–375 (2014)
Xia, F., Wang, W., Bekele, T.M., Liu, H.: Big scholarly data: A survey. IEEE Trans. Big Data 3(1), 18–35 (2017)
Yang, Z.R.: Biological applications of support vector machines. Brief. Bioinform. 5(4), 328–338 (2004)
Zhang, C., Bu, Y., Ding, Y., Xu, J.: Understanding scientific collaboration: Homophily, transitivity, and preferential attachment. Journal of the Association for Information Science and Technology. https://doi.org/10.1002/asi.23916 (2017)
Acknowledgements
We thank Tong Gao for assistance with the experiments. This work was supported by the National Natural Science Foundation of China (NSFC) under Grant 61502071, 71774020 and 71473028, and the Fundamental Research Funds for the Central Universities under Grant (DUT18JC09).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article belongs to the Topical Collection: Special Issue on Social Computing and Big Data Applications
Guest Editors: Xiaoming Fu, Hong Huang, Gareth Tyson, Lu Zheng, and Gang Wang
Rights and permissions
About this article
Cite this article
Wang, W., Xu, B., Liu, J. et al. CSTeller: forecasting scientific collaboration sustainability based on extreme gradient boosting. World Wide Web 22, 2749–2770 (2019). https://doi.org/10.1007/s11280-019-00703-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-019-00703-y